ERRATA 
THE ANNALS OF MATHEMATICAL STATISTICS 
Volume VI, No. 1, March, 1935 


On page 25, in Directions for Use of the Tables, p = q should read p > q, Qi = 21 + qu 
should read Q; = x0 + qn, Ds = x22 + qn should read Dy = x20 + qn. In the tables of 
values of x under p = .97, n = 25, instead of —.784 the number should be —.754. 








Td 








AN APPLICATION OF ORTHOGONALIZATION PROCESS TO THE 
THEORY OF LEAST SQUARES 


By Y. K. Wona 
Introduction 
The present paper is an outgrowth of the writer’s attempt to fill a lacuna in the 
discussion of the Gauss method of substitution as given by many writers. For 
illustration, let us cite Brunt’s Combination of Observations. In Chapter VI, 


we find: 
Let the normal equations be 





[aa]x + [ably + [aclz — [al] = 0 
[bbly + [bc]z — [bl] = 0 (i) 
[eclz — [cl] = 0. 
From this equation we find 
= - ey -et S. (i) 
Substituting, we obtain 
[bb1]y + [beljz — [bli] = 0 
[ecl]ze — [cll] = 0 (iii) 
where 
[bb1] = [bb] — [ab] [ab]/[aa], ete. (iv) 
From the first equation in (iii), 
=~ Gay? + ai ” 
In connection with equations (ii) and (v), the question naturally arises as to 
whether or not these numbers [aa], |bb1], --- are all different from zero. Since 


[aa] = Ya,a;, one can see that [aa] + Oifa; ~ Oforevery 7. However, to show 
the non-vanishing of [bb.1], [cc.2], ete. is by no means simple. Many writers do 
not give a demonstration on this point. We know that a system of non-homo- 
geneous linear equations has a solution if the system of equations is linearly 
independent. Brunt gives a discussion of the independence of the normal equa- 
tions in Chapter V, Art. 36, but he does not state clearly a condition for inde- 
pendence. He says: “The condition of independence is in general satisfied in 


53 


54 Y. K. WONG 


the problems which arise in practice. We can then proceed to the formation 
and solution of the normal equations.” It is one of the aims of this paper to 
give a necessary and sufficient condition for the independence of the normal 
equations and to show [aa], [bb.1], etc. are all different from zero when the condi- 
tion is satisfied. 

In the theory of least squares, there is the classical method of the derivation of 
normal equations by an application of the notion of minimum in differential 
calculus. After the normal equations are secured, the Gauss method of substi- 
tution is applied to obtain the solution. Doolittle modifies the Gauss method of 
substitution so as to facilitate the labor of computation. However, when the 
number of parameters (or unknowns) exceeds 4, Doolittle’s method is quite 
complicated. In the present paper the writer wishes to present a mathematical 
discussion of a method obtained through an application of the Gram-Schmidt 
orthogonalization process. This method furnishes us a new procedure for deter- 
mining the most probable values of the parameters (or unknowns). The formu- 
lation of the system of normal equations will be omitted in this new procedure, 
which is particularly effective in fitting curves to time series. The paper can 
be roughly divided into three parts. The first part gives an algebraic derivation 
of the normal equations. The second part derives a condition for a set of 
observation data so that the Gauss method of substitution is applicable. The 
third part gives a relationship between the Gauss method of substitution and the 
orthogonalization process. A practical application of the results of this paper 
will be found in a later paper. 

The process of orthogonalization has been used in the 19th century, and has 
been applied extensively in the theory of integral equations and linear trans- 
formations in Hilbert space. In classical analysis, if ¢:(x), go(x), --- , defined 
on (0, 1), is a normally orthogonalized system, and if f(x), defined on (0, 1), is 
such that f? is Lebesgue integrable, then the system of Fourier coefficients 


f,= [ sereseras 


has certain interesting properties, one of which is that 


2 / (f(2) — >) fe ide 


The preceding notion has a close connection with the theory of least squares as 
outlined in many texts on statistics. In section iII, the reader will find how 
this notion is applied in the derivation of the normal equations. Since the 
number of dimensions is finite, the integration process reduces to a summation 
process and furthermore no limiting process is used. This new derivation of 
normal equations has the advantage that (1) differential calculus is not used, 
(2) a new form of normal equations is obtained, (3) the solution of the unknowns 
or parameters can be immediately obtained without further application of the 





ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 55 


Gauss Method of Substitution or the Doolittle Method, and (4) the formula for 
the “‘quadratic residual” is obtained as a simple corollary. 

From the results in section IIJ, we see immediately what condition should be 
imposed upon the set of observation data so that the Gauss method of substitu- 
tion may be applicable. In section VI, we find a necessary and sufficient condi- 
tion for the independence of the system of normal equations (3.9), and also the 
fact that when this condition is fulfilled, then, due to the special nature of the 
coefficients of the unknowns, we see that the matrix is properly positive. It is 
on account of this fact that we are able to show that the numbers [aa], [bb.1], ete. 
are all different from zero. The demonstration of this point is found in section 
VII. In this section, we lay down a fundamental hypothesis for Gauss’s method 
of substitution, namely, the set of observations A; = (@a,--- ,@in) t = 1, 
2,---, 7, is linearly independent. Lemma 7.3 may be called the fundamental 
lemma for Gauss’s method of substitution. Some interesting properties of the 
numbers [A,A,-h], where s, ¢ = 1, --- , r, and his less than the smaller one of 
(s, t), are demonstrated. 

From the properties of the numbers [A,A,-h], where s, = 1, --- , rand his 
less than the smaller one of (s, ¢), and in comparison of the system of equa- 
tions (3.7°) with the final form of equations obtained through the application 
of the Gauss method of substitution, we can see the relationship between the 
Gauss method and the Gram-Schmidt orthogonalization process. If we should 
like to give some credit to Gauss, we may say that the orthogonalization proc- 
ess was known by him, but was stated in a different form. 

The writer wishes to remark that certain theorems together with proofs in 
section II, IV, V and VI are obtained from E. H. Moore’s lecture notes. How- 
ever the writer should be responsible for any defect. Finally, I should empha- 
size that the use of the notion of positive matrices is only for convenience. 


I. Vectors, Inner Products, and Linear Independence 
In this paper, we shall consider vectors of the form! 


(1.10) (v1, V2, +++, Un)« 


For convenience, we shall use capital letters to denote vectors of the type 
(1.10). 

Let V = (v, v2, ---,Un) and U = (uy, ue, --- , Un), then we say V = U if 
v; = u, for every i. 

We define V + U by 


(1.11) V+ U = (1 + td, 02 + Us, +++ Mm + Un), 


and sV, where s is a number, by 
(1.12) sV = (80, SV2, +++ , 8Un). 


1 If we write v; as v(i), where 7 = 1, 2, --- , n, then v(i) may be considered as a function 
of one variable whose range consists of a set of positive integers, (1, 2,---,m). E. H. 
Moore defines a vector as a function of one variable. 














56 Y. K. WONG 
Hence, sV = Vs. In particular, when s = —1. we shall put —V = (—1)V. 
Then U — V becomes a special instance of (1.11) and (1.12). 
From (1.11) and (1.12), we see that addition is commutative and associative. 


INNER Propucts: The inner product of two vectors V = (v1, --- , Un) and 
U = (uw, --- , Un) is defined? to be 
(1.2) (V,U) = Yow. 
1 


The norm of a vector V is defined by n(V) = (V, V); and the modulus of a 
vector V is defined by mod (V) = + VWn(V) . 
From (1.11), (1.12), and (1.2), we can easily prove the following theorem: 
THEOREM 1. The symbol (, ) has the following properties: 
(S) (U, V) = (V, U) for every V, U; (symmetric property) 
(L,) (sV, U) = s(V, U) = (V, sU) for every V, U and every number s; 
(Li) (U, (V + W)) = (U, V) + (U, W) for every U, V, W; (linear property) 
(P) (V, V) = 0 for every V; (positive property) 
(Po) (V, V) = O7f and only if V is a zero vector; (properly positive property) 





LINEAR INDEPENDENCE. A set of vectors V;, --- , V; is said to be linearly 
dependent in case there exist constants c, --- ,¢c, not all equal to 0 such that 


Vi + see cr-V, = 0, 
where 0 is a zero vector. 
A set of vectors Vi, --- , V, is said to be linearly independent in case, if the 
constants ¢, --- , c, satisfy 


Vi +eee t+ Oe 0, 
each constant c; = 0. 
THEOREM 2. If the set Vi, --- , V, is linearly independent, then none of the 
vectors is a zero vector, and hence the norm of every vector must be different from zero. 
For if V, is a zero vector, then set c, = 1, andc; = Ofori # s. Itis obvious 
that 


0-Vi+---+0-V,1+1-V, + 0-Voi+---+0-V, =0, 


which show that the set of vectors Vi, --- , V, is linearly dependent, contra- 
dictory to the hypothesis. 

A more general theorem is stated in 

THEOREM 3. If the set Vi, ---., V, ts linearly independent, then every subset? 
is also linearly independent. 

We shall prove this theorem by a contrapositive form. The contrapositive 
form is as féllows: If in the set Vi, --- , V;, there exists a subset which is linearly 


2 The notation ( , ) was introduced by D. Hilbert. In treatises on least squares, the 
notation | ]is used. The present writer reserves the latter notation for other purposes. 

3 Consider a set of integers (1, 2, --- ,n). Then any combination of this set of n distinct 
integers taken r S n at a time is called a subset of the set (1,2, --- ,). Likewise, we call 
any combination of the set of vectors Vi, Vo, --- , V, taken r S n ata time a subset of the 
whole set. 







ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 57 


dependent, then the whole set is also linearly dependent. Without losing any 
generality, let us suppose the subset Vi, --- , V, (s S r) to be linearly depend- 
ent. Then there exist c, --- , c, such that 


aVi+---+0¢V,=0. 


If s = r, then the whole set is linearly dependent. If s < r, then let c; = 0 
fort =s—1,s—2,---,r. Then 


r 


> eV: = 0, 


1 
which shows the whole set is linearly dependent. 
THEOREM 4.4 A necessary and sufficient condition for the set Vi = (va, «++ , Vin), 
2 = 1,--- ,r to be linearly independent is that there exists a non-vanishing deter- 
minant of order r in the array 


Vi1, 012, .en a Vin 


Va1, Vo2,°** 5 Von 


Uriy Ur2) To = Urn 


II. Gram-Schmidt’s Orthogonalization Process 


For the present section and the sequel, we shall adopt the notation A; = 
(aia, oo Qa, B;= (ba, cece » Oia), and C; = (ci, eee Cia) for? = - 2, 15s, 9 Ke 

THEOREM 5. For every set of vectors Ai, --- , A,r, there exists uniquely a set of 
vectors B,, --- , B, such that 

5.1) (B,, Bs) = 0 (t # s). 

5.2) For every t satisfying the relation 1 S t S 1, then A, is a linear combina- 
tion of By, --- , Bi; and B, is a linear combination of Ai, --- , At. 

5.3) B, = Ai; and for t > 1, (Bk — A, ts a linear combination of 
B,, «++ , Bin, and is also a linear combination of Ai, --- , A:-t. 

5.4) Ift > 1, then (A,, B:) = 0 for every s < 1. 

5.5) (A,, B,) = (Bi, Bi) = (Bi, At) for every t. 

To prove this theorem, let us define 


B, = A, 
B, = Ag if n(B;) = 0 


(Ae, Bi) 


a 


B, if n(B,) #0 


=F 
Be = Ar — Do huB; 


s=1 


4 See Dickson, Modern Algebraic Theories, p. 55; Bocher, Higher Algebra, p. 36. 












58 


where 


hi — (Ai, B,)/n(B;) ’ if n(B;) # 0, 


2.11 
_— = 0, if n(B,) = 0. 


We proceed to show that this set has the properties stated in the theorem. 
To prove 5.1), let us suppose ¢ < s. This assumption is permissible since the 
operator (, ) has the symmetric property. First, if A: = 0, then B, = 0, and 
(Bi, By.) = (A, A) => (0, Ao) = 0. 
Secondly, if Ai ~ 0, then B, ¥ 0 and 


(Ae, Bi) 


(Bi, Bz) = (Ai, Az — he, Bi) = (Ai, Ae) — (Ai, Bi) 
n(B:) 





= (Ai, Az) — (Ai, Aj) (Ag, A,)/n(Aj) = 0. 


Assume 5.1) is true for¢ = s — 1, then 
a=} ei 
(Bi, B,) = (2, A, = 2 iB) = (B,, A,) = Zz. h, (Bi, B;) . 
1 1 


The sum on the right hand side reduces to h,.(B;, B,), since the other terms 
vanish by assumption. Now if (B;, B,) ¥ 0 then by (2.11), hu(Bi, Bs) = (Az, B:), 
and by the symmetric property of ( , ), we obtain 


(B., B.) = (B.:, As) — (As, Bi) = 0. 


If (B., B.:) = 0, then by the Po-property of (,), we find that B; is a zero 
vector, and hence (B,, B,) = 0. 

5.2) follows from the definition of B,. 

That (A, — B,) is a linear combination of Bi, --- , B,, fort > 1 follows 
from the definition of B,. Since B, is a linear combination of (Ai, --- , A,-1), 
we secure the second part of 5.3). 


8 


By 5.2), we can determine g,; such that A, = )> g,B;. Thus for every 
1 
s < t, we have by 5.1) 


8 


(A,, B.) = (> 9Bi, B.) — > gei( Bi, B,) = 0 
1 1 


E72 


By 5.3), there exist g,; such that A, — B, = >> gB;and hence A, = B, + 
1 


os 
>> giB; Thus by 5.1), we have 
1 
¢—1 


t—1 
(A,, B,) = (2, + a 9B, B.) = (B,, Bi) + = gui(B;, Bi) 
I 1 


= (B., B,) . 
By the symmetric property of ( , ), we secure (A;, B,) = (Bi, B:). 





ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 59 


For the proof of uniqueness, let us suppose there exists a second set of vectors 
Bi, --- , B, having the properties 5.1), 5.2), 5.3), 5.4), and 5.5). By 5.3), we 
see that B, = A, = B;. Assuming the uniqueness holds true for r = t, we 
proceed to show that it is also true forr = t+ 1. By 5.3) there exist con- 
stants s;, s; (¢ = 1, --- , ) such that 


Bia = Auat 8A; 


Bis = Ari + jA.. 


t 
Bu — Bias = D> (8; — 8;)Ai- 
1 


From this, we secure 


t 
(Biss as Bit» Biya — Bi +1) = (B.. = Beis p (s; — s{)As) 


1 


t 
= 2X (s; — 8;)-(Bisa = Bia A;) =0, 
by virtue of 5.4). Hence by Po-property of (, ), we have Bw, — Bi., = 0 
and hence Bi = Bis. 

The set B,, --- , B, with the properties stated in Theorem 5 is called the 
orthogonalized set of Ai, --- ,A,. This process is called Gram-Schmidt’s orthog- 
onalization process. 

The set B,, --- , B, is called the normally orthogonalized set of Ai, --- , A; in 
case the former set enjoys the properties 5.1), 5.2), 5.3), 5.4), and if 


5.5n) (Aj, B,) = (Bi, B.) => (B:, A,) => 1 for every t . 


TueoreM 6. If a subset Ay, ---, Aim(L Shi S --- Sk, Sr) in the set 
A, ---,A,, is linearly independent, then there is a subset By, --- , Bum which 
has the properties stated in Theorem 5, and it is also linearly independent. 

Leth = kn — ki +1. To prove the theorem, we may assume kh, --- , kn to 
be 1,--- ,A < r, for otherwise, we may renumber the vectors. We construct 
the B vectors in the same way as given in equation (2.1) and (2.11). By 
Theorem 5, we have 


=i 
B, = Ai, B, = A, + > gid; (s = 2,---,h). 
1 


Suppose the constants ¢, --- , c, be such that 


oB, + cee + cB, = 0. 





60 Y. K. WONG 
Then by (2.2), we secure 


h h ge-—-i 
0 OA; + 7. ce,B, = (A, + > Cs (4. + 7 gs) 
2 2 1 


= (¢) + Cogn, +--+ + cngnr) Ai + (C2 + c2gse + +--+ Crgn2)Ae + «++ + cnAn. 
Since A,, --- , A, are linearly independent, we have 
C1 — C9 — +++ — CAYn, = O, 


Co — +++ — CoGng = O, 


(2.3) 


Ch 0. 
But the determinant of the coefficients of ¢;(i = 1, --- , h) is 
Lo gu Qs +++) Om 


0 1 932 +6 Ji2 
=1. 


0 0 OO .«-- 1 


Hence by a theorem in the theory of equations,® the only solution that satisfies 
(2.3) is that ki = ke = --- = k, = 0. Thus the subset B, --- , B, is linearly 
independent. 

Corotuary. The orthogonalized set Bi, --- , B, is linearly independent if and 
only if the set Ai, --- , A, is linearly independent. 

THEOREM 7a. If a set of vectors Aj, --- , A, is linearly independent, then the 
set can be normally orthogonalized. 

Let B; be the orthogonalized set of A;. Since A; is a linearly independent set, 
then the set B; is also linearly independent by Theorem 6. Hence by Theorem 
2, the norm of every vector B; is non-vanishing. Define C; = B;/mod (B)). 
Then this set C; enjoys the properties 5.1), 5.2), 5.3), 5.4) and 5.5n). 

THEOREM 7b. If a set of vectors, V;, --- , V, is normally orthogonal, i.e. if 


(1 (= j) 
(2.4) (V:, Vi) = lo ; at 


then Vi, --- , V, ts linearly independent. 
For suppose 
aVi+---+ce¢V,=0. 
Then 


r 


2. c(Vi, V;) = 0, 


ot 


5 Dickson, First Course in the Theory of Equations (1922), p. 119. 





ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 61 


By condition (2.4), the preceding expression reduces to 
c; = 0, 


which shows the linear independence of V;, --- , V;. 


III. Algebraic Derivation of the Normal Equations 


Consider a linear function 


(3.1) l= Pit a Poo a eee a pt, = z. Dik; - 
1 
Let the set of observations of x; and 1 be 
(3.2) A; = (aa, ee » Riad, L = (hi, eae Se i) 
respectively, then the residual v; is 
v= Do pian — li 


i=1 
In vector notation, 


V = dX pds — L. 
j= 


The theory of least squares requires us to find the values for p;, --- , p, so as to 
make (V, V) a minimum, or 


(3.3°) (>5;A; — L, > pA; — L) = aminimum. 


Let Aj, --- , A, be linearly independent. By Theorem 7, the vectors Aj, --- , A; 
can be normally orthogonalized. Let C,,---,C, be the normally orthog- 
onal set. Then every A, (t = 1, --- ,7) is expressible as a linear combination 
of Ci, ---,C,. Let us write 


(3.3) >, piAj = DOK. 
1 1 


Our problem now is equivalent to that of finding the values k;(¢ = 1, --- , r) soas 
to render the inner product 


(3.4) (Dok; — L, d) kC; — L) 
aminimum. Expression (3.4) can be written in the form 
(L, L) — 2 DL, Cidh + De (his biC) 
(3.5) = (L, L) — 20 (L, Cdk + DKF 
= (L,L) — VL, Ci)? + Do (ki — (Ci, L))*. 
Hence (3.4) gives a minimum if and only if the last summation vanishes, i.e., 


(3.6) in€.2 ‘ Ca tus hs 








62 Y. K. WONG 






The Bessel’s inequality 
> ki s (L,L) 
1 
is obtained from (3.6), (3.4), and (3.5). 
To solve for p;, we make use of (3.3) and (3.6), and secure 


>» Api = dD (Ci DC, 
1 


1 
whence 


(c, 2 Av) = (c., z (Ci, L) c.) : 
1 1 
On the right hand side we have 


(Ci, > A L)C;) = (Ci, L) (Ci, C3) — (Ci, L) ’ 


since (Cx, C;) = 0 whenz # k, and (C;, C;) = 1 whenz = k. On the left hand 
side, we have 


r 


(4, Z. Aw) = > (Ci, A;)p; = Z (Ci, A;)p;; 
7=1 j=1 


j=k 


since (C;, A;) = 0 when 7 < k. Hence the values for pi, --- , p, are given by 











+ 


(3.7) , (Ci, A;)pi — (Ci, L) (k = 1, trae r) ’ 
=k 
where (Ci, A;) = (C;, C3) =]. 
Equations (3.7) are called the normal equations, which are derived without 
using any notion in differential calculus. 
From (3.6) and (3.5), we secure the value for the ‘quadratic residual’ (V, V): 






(3.8) (V,V) = (LL) — © (, Ca, 


which is a positive quantity by virtue of the Bessel’s inequality. 

Let B,, --- , B, be an orthogonalized set of A, ---,A,. Then every vector 
B; has a non-vanishing norm, and B; = mod (B;)-C;. Hence from (3.7) and 
(3.8), we have 


(3.7°) Zz (Bi, A;)pi = (Bi, L), (k = 1,2,---,r), 


i=k 









(3.8°) (V, V) =(L,L) — ¥ (L, B)?/n(B). 


¢™*} 







Thus we have proved the following 


THEOREM 8. Given a linear function (3.1). Let the set of observations of 2; 
and | be 


A,= (aa, . 





- » Giny L = (hh, --- , ln) 





+) 7T3;n 27) 

























ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 63 


respectively. Let Ai, ---, A, be linearly independent, By, --- , B, be the orthog- 
onalized set, and C,---,C,, the normally orthogonalized set of Aj, --- , Ap. 
Then the set of values pi, --- , pr will minimize (3.3°) if and only if the system of 


equations (3.7°) or (3.7) holds true; in other words, >, p:A; — L is orthogonal 
i=1 
to C; or to B; for every j. The quadratic residual (V, V) is given by (3.8°) or (3.8). 

From (3.7), we can secure the solution for pi, --- , p; immediately without 
further application of the Gauss method of substitution. 

The proof of the following theorem does not make use of the orthogonalization 
process.® 

THEOREM 8°. Let F == p;Ai, where every A; is not a zero vector. The set of 
values pi, --- , pr will minimize (3.3°) if and only if (F — L, A;) = 0 for every 
1,.1.e., F — L is orthogonal to every A;. 

The condition is necessary. To prove this, we show that if (F — L, A;) ¥ 0 
for every 7, then we can find another set q:, --- , g, such that n(F — L) > 
n(G — L), whereG = 2 q;A;. Forif (F — L, Ai) ¥ 0 for every 7, then we can 
find a vector A, such that (F — L,A,) # 0. Since A, ¥ 0, we let e = 
(F — L, A,)/n(A,) andG = F — eA, = 2q:A;. Then 


n(G — L) = n(F — eA, — L) = n(F — L) — (F — L, A,)?/n(A,) , 


which shows that n(G — L) < n (F — L). 

To prove the sufficiency, we show that for every set qi, --- , q, different from 
Di, ++, pr then n(G — L) > n(F — L), whereG = 2 qiA;. Let si = qi — pi, 
and H =2s;:A;. ThenG=F+H. Nowif (F — L, A,) = 0 for every i, it 
follows that 


r 


(F — L, H) = 7 (F —L, Ais; = 0. 


t=1 


n(G — L) = n(F — L) 4+ n(A). 


Since n(H) > 0, we have n(G — L) > n(F — L). 

The preceding theorem does not require the linear independence of the 
vectors Ai, ---,A,. By Theorem 7a and 7b we see that it is necessary and 
sufficient for the set Ai, --- , A, to be linearly independent in order to solve the 
equations (F — L, A,;) = 0,(@ = 1,2, --- ,r), or 


(Ay, A1)pi + (Ai, A2)p2 + +++ + (Aj, Ar )pr = (Aj, LZ) 


(Ay, A1)pi + (Ar, A2)p2 + +++ + (Ar, Ar)Pe = (Ae, Z) . 


6 The proof is based on the same type of reasoning as used by Jackson. See Dunham 
Jackson’s Theory of Approximation, pp. 151-152. 








64 Y. K. WONG 















If A, --- , A, are linearly independent, the conclusion in Theorem 8° can be 
deduced from Theorem 8. For by Theorem 7a) A; = >> suC, and hence 


t 


(F — L, Ai) = (F «i My 2 suCr) = i si(F — L,C,) =0. 
t t 
Also, Theorem 8 can be deduced from Theorem 8°. 


IV. Matrices and Their Reciprocals 


An ordered array of numbers of the form 








ai, Gi, -** Aim 








Aoi, Ae, +++ Aom 





a = (aij) = 




















Qnly QAn2) -** Anm 





is a matrix. If we write a(7, 7) = ai;, then the array of numbers (4.1) may be 
considered as a function of two variables 7, 7 on the ranges of positive integers 
(1, 2,---,m), (1,2, ---,m).7 Thus a vector is a special instance of a matrix. 
We shall use Greek letters to denote matrices throughout this paper unless other- 
wise specified. When n = m, i.e. the number of rows is the same as the number 
of columns, we have a square matrix. Associated with every n-row square 


matrix, x, a determinant can be defined, and for simplicity, we shall adopt the 
following notation: 


| on o-* Gig 


D(x) = 





Qni-** Ann 





An identity matrix, denoted by 6 = (d,;), is a square matrix of which the 
elements in the principal diagonal are 1 and elsewhere 0, i.e. d,;j = 0 (¢ ¥ 9), 
di; = 1. A zero matrix, indicated by w,.is one such that every one of its ele- 
ments is 0. The transposed matrix, a’, of a is formed by interchanging the 
rows and columns. We say two matrices a = (a;;) and 8 = (b,;) are equal in 
case a;; = b;; for every 7,7. A matrix a is symmetric in case a’ = a. The i 
column of a is indicated by a(., 7), the 7" row of 8 by B(z, .) and the element in 
the 2** row and j* column by a (7, 7). Hence a(i, 7) = ai;. 

AppiTi0on: Let a be a matrix given by (1) and 6 = (b;;) a matrix of the same 
number of rows and columns asa. Then 





a+ B= (ai + bi). 


7 E. H. Moore defines a matrix as a function of two variables. 







ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 65 


We note thata + 86 =68+ a. Ifyisa matrix of the same number of rows and 
columns as a, then (a + 8) +y=a+(6+/y7). 

MULTIPLICATION: Let a = (a;;) be defined by (1), and 6 = (b;,) be a matrix 
of m row and r columns, then the product 7 = af is defined by 


m 
r= (pix) = (é abi) e 
fa 

Thus z is a matrix of m rows and r columns. 

The multiplication of two matrices is not necessarily commutative. 

If ais a matrix of n rows and m columns, 6 of m rows and r columns, and y of 
r rows and s columns, then a(Sy) = (a8)y. If a is a matrix of n rows and m 
columns, and 6, y are matrices of m rows and r columns, then a(68 + vy) = 
aS + ay. 

ScaLAR MULTIPLICATION: Let s be a number, and a be a matrix of n rows and 
m columns, then 


S-a = (saij) = a-s. 


Let 6, denote a square matrix of n rows in which the elements in the principal 
diagonal are s, and 0 elsewhere. Then 6, = sé, where 6 is an n row identity 
matrix. We note from the associative law of multiplication that 


Sa = 6,-a = a-6,. 


In particular, let s = —1, then we have —la. For convenience, we write 


—a = —la. From the definition of addition, we obtain a definition of sub- 

traction for two matrices of the same number of rows and columns. 
REcIPROCALS OF Matrices: Let a be a matrix of rows and m columns. 

Then a matrix a~! of m rows and n columns is said to be a reciprocal of a in case 


a-a! = 6", and a!-a = 6", 


where 6", 5" are identity matrices of order n, m respectively. If a matrix a has 
a reciprocal a~', we can prove a! is unique. It can be shown that when a has a 
reciprocal, it must be a square matrix.® 

A matrix is said to be non-singular in case it has a reciprocal, otherwise it is 
said to be singular.’ It is evident that every zero matrix is singular, and an 
identity matrix is non-singular. 

Suppose a is a square matrix of order n. Let us denote the cofactor of the 
element a;; of a by e.. Then 


is called the adjoint matrix of a. 


8 For the proof of this statement, see Moore, Vector, Matrices, and Quaternions. 
® This definition is due to E. H. Moore. 








66 Y. K. WONG 





If a is symmetric, then ¢ is also symmetric. Since aiei; + +--+ + Gin€nj = 
D(a) or 0 according as i = 7 or 7 ¥ 7, we secure the following: 
THEOREM 9. Let a be a square matrix and ¢ its adjoint, then 


ae = ea = [D(a)]5. 









THEOREM 10. If the determinant of a is different from zero, then there exists a 

reciprocal a“, and a! = adj a/D(a). 
This theorem follows from theorem 5. 

The converse of Theorem 6 is also true. 


V. Symmetric Matrices of Positive Type” 


























Let a = (a,;) be a matrix of m rows and m columns; and let o = (ki, --- , kn) 
and p = (hi, --- , hm) be integers among the sets (1, --- , n) and (1, --- , m) 
respectively. The subsets o and p may be equal to the whole sets (1, --- , n) 
and (1, --- , m) respectively. Then 





(3) a(o,p) =e cece 


is called a minor of a. In notation we write this minor as a(c, p) indicating the 
ranges to be o and p. 

The minor a(—o, —p), which is obtained by striking out all the k,** (¢ = 1, 

- ,m) columns and h;* (j = 1,---, m) rows from a, is called the com- 
plementary minor of a (0, p). 

If a is a square matrix of order n, then a(c, c) is called a principal minor of a. 

Let a and 8 be matrices of n rows and m columns; and let co, p have the same 
meaning as above. Then a(o, p), B(o, p) are called corresponding minors in 
a, B respectively. 

A symmetric matrix a = (a;;) of order n is said to be of positive type in case 
the determinant of every principal minor of a is positive, and is said to be of properly 
positive type in case the determinant of every principal minor of a is greater than 
zero. 

CoroLtuaRy V1. Every element.in the principal diagonal of a positive, sym- 
metric matrix is positive. 

For, let o consist of a single integer 7z, then a(o,«) = ai; = 0. 

Corouuary V2. If a symmetric matrix is properly positive, then every element 
in the principal diagonal is greater than 0. 

THEOREM 11. If a symmetric matrix a of order n is (properly) positive, then its 
adjoint matrix € is also symmetric and (properly) positive. 


10 We follow the terminology of E. H. Moore. 
extensively. 





Moore developed this notion quite 













ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 67 


The symmetry of ¢ is evident. Leto be a subset of (1, --- , m) and let p be 
the number of integers in o. Consider any principal minor e(c, a) in the adjoint 
matrix e. Byatheorem in the theory of determinants, we have" 


D{e(o, o)] = (—1)*-Dla(—o, —o)]-[D(a)]?, 


where k is an integer depending on the set ¢. By hypothesis a is positive (prop- 
erly positive); hence D[a(—o, —c)] and [D(a)]?" are positive (greater than 0), 
and it follows that D[e(c, «)] is positive (greater than 0). 

THEOREM 12. If a symmetric matrix is properly positive, then D(a) is different 
from zero, and a has a reciprocal a, which is also symmetric and properly positive. 

For take o to be the whole set (1, --- , m) in the definition of proper positive- 
ness, and we see that D(a) # 0. The theorem now follows from Theorems 10 
and 11. 


VI. Gramian Matrices 


In this section, we shall study the matrices of the normal equations (3.9). 
The main result is that if the set of observations A;, --- , A, is linearly inde- 
pendent, then the matrix (called Gramian matrix) is properly positive and has a 
reciprocal which is also properly positive. ‘ 

THEOREM 13. Let A, --- , A, be a set of vectors, and let By, --- , B, be the 
orthogonalized set of vectors. Then the matrix 


(Ay, Ai) «++ (Aj, Ay) 
(6.1) (Ai, OP , A,) = 
(A,, Ay) --- (A,, A,) 
has the following properties: 
13.1) symmetry 
13.2) D[g(Ai, --- , A-)] = n(Bi)n(B2) --- n(B,), 
13.3) positiveness. 
A matrix of the form (6.1) is called a Gramian matrix. 
In fact, the symmetric property follows from the fact that (A,, A;) = (Aj, A,) 
for every 7, j. 
We shall prove 13.2) by induction. For r = 1, we have by Theorem 5 


(Ai, A) = (B,, B;) = n(B,). 


Assume the equality is true for r = t, we shell show it is true for r = ¢ + 1. 
The (¢ + 1)-row determinant is as follows: 


(Aj, A)) sabes (Aj, At) (Ai, Ass) 


(6.2) Dig(Ay, ---,A)] =| (4,, A) «+» (A, Ad (Ay Acar) 
(ha Bad +++ (hy Aes Medd 


11 In case o = (1, --- , n), —o is a null class A (a class which contains no element); then 
we define D[a(—o, —c)] = 1. For the proof of this theorem, see Bocher, p. 31. 








68 Y. K. WONG 

By Theorem 5, there exist constants s;(i = 1, --- , £) such that 
Aw = Bui t+ : 8; Aj. 

Substituting this value into the last row, we find the element in the 7" column is 

(Aj, Aru) = (4, Biya + ) 8; Ai) = (A;, Buy) + : s;(Aj, Aj) 


(¢=1,---,t¢4+1). 


The second term on the right is a linear combination of the first ¢ elements in the 


7» column of the determinant (6.2) and hence by the theory of determinants,” 
we secure 















(Ai, Aj) a (A, A) (Aj, At41) 


SHSSCCOCHSEOCECSCOR ACA HORE H EDC SD 


D[g(Ai, «++ , Acq] = (A.A) --- (dg Ad Ca head 


(Ai, Biss) ne (A, Biss) (Ars, Bis1) 


By Theorem 5, we find that (A;, Biz:) = 0 for 7 = 1,--- , t, and (Agu, Biss) 
= (Bis, Biss), and hence the preceding determinant reduces to a form in which 
the first ¢ elements in the (¢ + 1)" row are zero. Thus 


(Aj, Aj) sh (Ai, A;) 


D[g(A1, +++ , Acga)] =H [eee c rece eee e ce eees ‘ 












= n(B,)n(Be) os n(Bi)n(Be41) 


which proves 13.2). 
Consider any subset o = (ki, --- , kn) of the set (1,---,7). By the same 
argument as above, we find that the determinant of any principal minor 


(Any, Az) +++ (Any Atm) 
i =) Rerdudasusiexesedecnen 
(A km) Aj) ri (A km) Aim)’ 


By Theorem 1, the number on the right is positive. Thus the matrix ¢ is 
positive. ‘ 

THEOREM 14. The following three assertions are equivalent: 

14.1) the set Ai, --- , A, is linearly independent; 

14.2) the Gramian matrix (6.1) ts properly positive; 

14.3) The determinant of the Gramian matrix (6.1) is different from zero. 

We shall prove that 14.1) implies 14.2); 14.2) implies 14.3); and 14.3) implies 
14.1). We thus prove the three statements are equivalent. 






= n( Bi) ee Bim) ° 


















12 Dickson, First Course in the Theory of Equations (1922), p. 113. 






ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 69 


Let Bi, --- , B, be the orthogonalized set of the set A;,---,A,. Since the 
set Ai, --- , A, is linearly independent, then every subset 


Ain, -::,Aml Sh Ss sh S & & Pr) 


is also linearly independent, and hence n(B;;) > 0 fori = 1,2, --- ,m. By the 
same argument as given in the demonstration of Theorem 11, we find that the 
determinant of any principal minor (6.3) is greater than zero. This proves the 
matrix (6.1) is properly positive. 

If the matrix (6.1) is properly positive, then by Theorem 10 the determinant 
of (6.1) is different from zero. 

To prove 14.3) implies 14.1), suppose k,(i7 = 1, --- , 7) are such that 

kA: +---+k,A, = 0. 
Then 
(hiAi + +++ + hey, As) = Bi(Ai, Ad) + +++ + b(A,, As) = 0 

fort = 1, --- ,r. Since (A;, A;) = (A;, Ai), and D(¢) ¥ 0, the set of con- 
stants k; must be all equal to 0.% 

From Theorem 14, and Theorem 10, we may state the following 


Corouuary: If the set of observations Ai, --- , A, ts linearly independent, then 
the Gramian matrix ¢ has a reciprocal which is properly positive. 


VII. Gauss Method of Substitution 


Lema 7.1) Let ¢ = (sj) be an r-row symmetric matrix such that sy, # 0. 
Then there exists an r-row square matrix rt whose determinant is unity such that 
Y = (rij) = t¢ has the following properties: 

a) ra = Ofort > 1, and ni = 81: for every 7; 

b) the first minor of ri. ts symmetric; 

c) the determinant of every principal minor in y of the form 


$11 81kg °° *° Sikm 
’ 


* Temkm 


is equal to the determinant of the corresponding principal minor in ¢. 
To prove this lemma, let us define 


(7.2) 7=6+4+ F,-D,, 


where D, is the first row of an r-row identity matrix 6, and F,(1) = 0, 


F\(n) = —Sin/81 (n > 1). 


(Thus F, D, is an r-row square matrix in which the first column is F and every- 
where else 0.) It is clear that 7 thus defined is a square matrix of order r, and 


13 See footnote 5. 








70 Y. K. WONG 


D(r) = D(6 + F,D,) = 1. By multiplication of these two matrices, 7g, we 
obtain a new matrix such that mm. = su, ra = 0 for 7 > 1, and mi = 8); for 
every 2, and further 


(7.3) ‘ij = Sij — 81° $1;/Su1 for 7 > 2 > . 


To prove property (b), we note that s;; = s;:, since ¢ is symmetric. Thus for 
t > 1,7 > 1, we note from 7.3) that 


i no 81; $1;/S811 = 3 — 81; $1:/811 = "ji 


For the proof of the last property, we note that the corresponding minor of 
(7.1) in gis of the form 


S11 S1x2 °° * Sikm 
(7.4) Sike Skgk2 +++ Skokm 
Sikm Skokm ** * Skmkm 


Since ¢ is symmetric, we have by (7.3), 
Vikgky = Chk; = 81%; 814;/811 (i> 1, j > 1), 
0= Ska — S14; 811/81 (7 > 1). 


Thus by a theorem in the theory of determinants, the determinants of (7.1) and 
(7.4) are equal. 

Lema 7.2) Leto = (s,;) (¢, 7 = 1, ---, r) be a symmetric matrix of positive 
type, and sy, ~ 0. Then there exists an r-row square matrix t whose determinant 
is unity such that y = (rj) = te has the properties stated in Lemma 7.1) and 
furthermore the minor of 71, in 7.1) is of positive type. 

To prove the positiveness of the minor of 7;,, let the determinant of any one 
of its principal minors be 


/ keke Vokm 
ee Res gokiukiois GHihs «-. 5h. 50, 
Yeokm °° ° Vimkm 
where rx;4; = Tkjk; (4, J = 2,---, m) dué to the symmetry. Now consider the 
bordered determinant 
171 «(ike 0° °° Tike 
M:, = 0 Vroke Tkokm 
0 Tkokm a Vimkm 


which by property (a) in Lemma 7.1) gives Mz = 71M; = 8siM,. By property 
(c) in Lemma 7.1), Mz is equal to the determinant of the form (7.4), which by 
hypothesis is positive. Thus s,.M, = 0. Since sy > 0, we conclude that 
M, = M2/sy = 0. 











ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 71 


LemMMA 7.3). Let » = (sj) (¢,7 = 1, 2, --- , r) be a symmetric matrix of 
properly positive type. Then there exists an r-row square matrix + whose deter- 
minant is unity such that ~ = (ri) = te has the properties stated in Lemma 7.1) 
and furthermore the minor of ru, in W is properly positive. 

Since ¢ is properly positive, we find that s,, > 0. The proof of this lemma is 
similar to that of Lemma 7.2). 

Suppose that the set of observations A;, --- , A, is linearly independent. 
Then by Theorem 14, the Gramian matrix (6.1) is symmetric and properly 
positive, and hence (Ai, Ai) > 0. By Lemma 7.3), the matrix (6.1) can be 
reduced to the form 


NN cc cicinsetin [A, A, -0] 
(7.5) 0 [A2Aeo-1] [Ae A3-1] --- [A2A,-1] 


eeeeere eee eee eee eee eee eee eee ee eee eee eeee 





where 
[A A,-O0] = (Ai, Ay) = [A A1-0] (¢=1,---,r) 
[A, A1-0] [A,A,-0] — [Ai A,-0][A1 A;-0] 
A,A,-1| = : 


It is evident that [A,A;-0] = (Ai, Ai) > 0, since the matrix (6.1) is properly 
positive. By Lemma 7.3) the value of D (¢) and the determinant of (7.5) are 
equal, and furthermore the minor of the element [A1A;-0] is a symmetric matrix 
of properly positive type. Thus [A2A_2-1] > 0, and [A,A,-1] = [A,A:-1). 

The minor of [A1A;-0] surely satisfies all the conditions in Lemma 7.3). We 
may, therefore, apply a transformation of the form (7.2) to the minor of [A1A1-0], 
and secure another matrix of the same character as (7.5). In other words, we 
may multiply on the left of the matrix (7.5) by 


(7.6) 22> 6 FD, 


where Dz is the second row of the r row identity matrix 6, and 


[A2A,- 1] 
= < = a ee . 
F.(n) = 0 (n < 2); F.(n) [4-d.-1] (n > 2) 
In general, let 
(7.7) 74= 6+ FD; @=1,---,r—1), 
where D; is the 7** row of the r row identity matrix 5, and 
(7.8) F(n)=0 (ni); Fin) = tet — © (n >i). 


 Aehet — 2 





72 Y. K. WONG 


Continuous application of this type of transformation ultimately reduces the 
matrix (6.1) to the form 







[A1A1-0] [A1A2-0] [A1A3-0] --- [A14,-0] 
O  [AeAe-1] [AeAs-1] --- [A2A,-1] 
0 0  [AsA5-2] --- [A3A,-2] | 


CHSEHHA SEDO SHESESSHCKHH ECDC OCHO E E646 OEE EOS 


(7.9) 7 = 














where 
: - [A,An-h — 1] [AtA,-h-1] — [AnAi-h — 1] [Ap A,-h-1] 

(7.91) ean [A,A,-h — 1] 
(t, s = 1, e466 


In the matrix (7.9), we see by virtue of Lemma 7.3) that [A;A;-7 — 1] > 0 for 
every 7, and [A,A,-h] = [A,A:-h] for every s, t and 0 < h S smi(t, s). If 
h = sm(t, s), then [A,A,-h] = 0. 

Let + = 7,-1-T,-2 --- T1. Then by the associative law of multiplication of 
matrices, we see that 


(7.10) 












o 
IIA 


h 


A 


sm(t, s)).¥ 


77; 











n= (tr-a-++ T1)f = To. 
Thus we prove 

THEOREM 15. If the set of vectors Ai, --- , A, is linearly independent, then there 
exists a square matrix +t of order such that r¢ is of the form (7.9) where all ele- 
ments below the principal diagonal are 0; every element in the principal diagonal 
[A;A;-7 — 1] @ = 1, --- , r), is greater than zero; and [A,A,-h] = [A,A:-h] for 
s,t = 1,---,7r, andh < smit,s). Furthermore the determinants of the matrices 
(6.1) and (7.9) are equal. 

We now prove the following lemma which will be useful in the later section. 

Lemma 7.4). Jf [A;:A;-7 — 1] is different from zero for every 1 = 0, then for 










every pair of integers (s, t), where s,t = 1,---,r, andn S sm(t, s), we have 
Si [AiAr-t — 1] 
a) [AcA,-n] = (A, A) — 2s eg ay Aedes — 21. 












b) [A,(A, + A,)-n] = [A,A,-n] + [A:Au-n], (u =1,---,7). 
c) [(cA,)A,-n] = c[A;A,-n], (c = a constant). 


To prove a), take every pair (s, ¢). We find the lemma is true for n = 0. 
Assuming it is true for every (s, ¢) and for n = h < sm(s,t), we findthath + 1 < 
sm/(s, t), and 


h+1 


(4y Ad — Dy eae én% 








14 sm (s, ¢) read ‘‘the smaller one of (¢, s).”’ 










ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 73 


mb Ad — 3 A dt — 1 i tna 


[AnsiAr-h] 
[AnAn-h] 


— [A,A,-h] iad 


for every s, t. 
Parts b) and c) are true for n = 0. Now make use of the equality in a) and 
prove by induction. 


[AnyiA,-h] — [A,A,-h + 1], 


VIII. Gauss’s Method of Substitution and its Relation to Gramian Schmidt’s 
Orthogonalization Process 


Let us write the set of observations in the form: 


e=(1 


Let the orthogonalized set also be written in the form 


From Theorems 5 and 6, we find that there exists a transformation «x given by an 
r-row square matrix such that 8 = xa. Thus by the associative law of multi- 
plication of matrices, we have 


Ba’ = (ka)a’ = x(aa’). 
Now the matrix aa’ is the Gramian matrix (6.1). Thus 
(8.1) > Ba’ = «fb. 
The composite matrix Ba’ is of the form 
(Bi, Ax)(Bi, As) --- 
- (B,, A,) 


By Theorems 5 and 6, we note that (B,, A,) = 0 fors >t, and (B,, A,) = 
(B,, B,) for every s. Thus the preceding matrix can be written in the form 


(Bi, By) (Bi, A») (Bi, As) eee (Bi, A,) 


0 (Bo, B3) (Bo, A;) — (Bo, A,) 
(8.3) 


- (B,, B,) 








74 Y. K. WONG 





We have proved the following theorem: 

THEOREM 16. Let Ai, --- , A, be a set of vectors, and B,, ---, B, be the 
orthogonalized set; and let a = (aij), 8 = (bij). Then there exists a square r-row 
matrix «x such that 8 = xa, and xaa’ is a matrix of the form (8.3) where all the 
elements below the principal diagonal are zeros and every element in the principal 


diagonal is positive. If the set Ai,---, A, ts linearly independent, then every 
element in the principal diagonal is greater than zero. 
THEOREM 17. Let Ai,---, A, be a set of vectors and B,,---, B, be the 


orthogonalized set; and let a = (aij), 8B = (bij). Then D(Ba’) = D(aa’). 
For by equations (2.1), we note that D(x) = 1. Thus 


D(Ba’) = D(xaa’) = D(k)D(aa’) = D(aa’). 


THEOREM 18. If the set of vectors, Ai, --- , A, is linearly independent, the 
matrix x arising from Gram-Schmidt’s orthogonalization process is identical with 
the matrix r defined by (7.10). 

To prove this theorem, we first establish the following 

LemMaA 8.5): If the set Ai, --- , A, be linearly independent, and B,, --- , B, be 
the orthogonalized set, then for every t, h, we have 


(Br, A:) = [AnAr-h — 1). 


By Theorem 10, the set B; is linearly independent, and hence n(B;) > 0 for 
every 7. The lemma is evidently true forevery/andh = 1. Assuming it is true 
for every t and h = s, we shall prove it is also true for every t andh = s — 1. 
Now 


8 


= [A;A,-7 re 1] 
aa = 


— (A,, Bi) 
— (B,, Bi) 


Bass = As = 





t=] 











Thus by the linear property of ( , ) we secure, for every ¢ 


 [A:A,-¢ — 1 
(Bast, A,) — (4. — 2 a B;, 4, 


¢=1 


- = [A,A,-¢ — 
= (Aan, Ad) — Zz Beet = 
i=1 


>. 


1 
[A;A;- 1] (Bi, A,) 


~. 


“|e, 


— [A;A,- 
aunts > eee 
A [A,A;i — 


~ 
— 
tome’ 


— [Aside 5 s] 


by virtue of lemma 4.4). 
From this lemma, we conclude at once that the matrices (7.9) and (8.3) are 
equal. Thus by (8.1), we have 


KC = Ba’ = re, or (kx —rT)i =o. 











ORTHOGONALIZATION PROCESS AND THEORY OF LEAST SQUARES 


Since ¢ is non-singular (by Theorem 12), we have 
w = (x — r)ff-' = (k — r)6 = k — 7, 


which proves the theorem. 

From Lemma 8.5), we have 

Lema 8.6). Let L = (lh, ---, l,). Suppose the set Ai, --- , A, to be 
linearly independent, and B,, --- , B, to be the orthogonalized set. Then for 
every h 


[A,L-h — 1] = (B,, L) . 


Theorems 16, 17, and 18 furnish us a new method for finding the most prob- 
able values of the unknowns in the theory of least squares. The formulation of 
the system of normal equations may be omitted in this new procedure, which 
may be described briefly as follows: After we obtain a set of observations 
A, --- , A,, we orthogonalize this set by means of Gram-Schmidt’s process. 
Let L be a non-zero vector. The product 


bu wer Din 


br ee brn Gin *** Orn; —l, 
will give us the result as desired by Gauss’s method of substitution. 


ACADEMIA SINICA, 
PEIPING, CHINA, 





A NOTE ON THE ANALYSIS OF VARIANCE! 


By SoLoMOoN KULLBACK 


By considering a set of independent items classified in some relevant manner 
into N sets of s items each, and by the use of a dispersion theorem of Prof. J. L. 
Coolidge,? Prof. H. L. Rietz* arrives at estimates of variance, used by Dr. R. A. 
Fisher, without making use of arguments involving the number of degrees of 
freedom of the items concerned. 

By proceeding along the lines followed by Coolidge and Rietz but considering 
a set of independent items classified into N sets of s;(¢@ = 1, 2, --- , N) items 
each, we shall arrive at certain other important results of R. A. Fisher* in his 
analysis of variance. 

The theorem referred to above is as follows: If n independent quantities 
Yi, Y2, «++ » Yn be given, their expected values being a, de, --- , dn, while the 
expected values of their squares are Ai, Ao, --- , An, respectively, and if we 


agree to set y = (1/n) > yi, a = (1/n) >> aj, then the expected value of the 


i=1 i=1 


variance, (1/n) Z (yi — y)* is 


1=1 


ljn-1 2 ; 
(1) al a 2s Ae — 8) + 2, (we a 


t=1 2=1 


Suppose a set of independent items has been classified in some relevant man- 
ner into N sets of s; (¢ = 1, 2, --- , N) items each as follows: 


T11, 12, °°° 


Un1,’Xn2, °°* » UNsyy tn 
x 
where Z;(i = 1, 2, --- , N) is the arithmetic mean of the 7” set and the mean 
of the pooled sample of s = s; + so+ --- + sy items. 
We shall assume that the set (2) is statistically homogeneous in the sense that, 


1 Presented to the American Mathematical Society, February 23, 1935. 

2 Bulletin Am. Math. Soc., Vol. 27 (1921) p. 439. 

3 Bulletin Am. Math. Soc., Vol. 38 (1932) pp. 731-735. 

4 Proceedings of the International Math. Congress, Toronto, 1924, Vol. 2, p. 802 ff. 


76 





NOTE ON ANALYSIS OF VARIANCE 77 


using E (_) for the expected value of the expression in the parenthesis, we may 
let E(x) = a, E(xz;) = A, (@ = 1,2, --- ,N,j = 1,2, +--+ , 85). 
Then, using (1) 


(3) (>) (ij — a) = (8: — 1) (A - a). 


=< 


Summing (3) from 7 = 1 to N, we have 


(4) Hf > u- sy) = (A — a) >) i — 1) = — IA - 9). 


i=1,j=1 os 


Similarly, by using (1) 


N N 

7 - mt. == ‘ =2 2 

(> s(% — ») -— 2 s; [E(#}) — a’]. 
E(#?) — a? = E(#; — a)?, and 

(7) E(#; — a)? = (A — a?)/si, therefore 


(8) (> si(%; — ») = (N _ 1)(A _ a’) ° 


‘1 


Similarly by using (1) 


(9) n( Ss (ty — ») = (s — 1)(A — a’). 


t=1,j=1 


Thus, in a statistically homogeneous set of items, classified as in (2), the fol- 
lowing estimates of Variance have the same expected value: 
N,8i 
where S = Zz (xi; — £)? 
i=1,j=1 
N,8i 
where S; = za (xi; — &,)? 
t=1,j=1 
N 
where S: = z s(%; — £)?. 


=} 


eI? 


These estimates are used in applying the analysis of variance to the study of 
the correlation ratio, 7, for uncorrelated material, where 7? = S;/S. 


OFFICE OF THE CHIEF SIGNAL OFFICER, 
WasuHineTon, D. C. 


5 Rietz, H. L., loc. cit. p. 733. 













A PROBLEM INVOLVING THE LEXIS THEORY OF DISPERSION 
By Water A. HENDRICKS 


The attention of the author was recently directed to a study of the hatch- 
ability of chicken eggs at the U. S. Animal Husbandry Experiment Station, 
Beltsville, Maryland. It was necessary to find the average hatchability of the 
fertile eggs incubated for each of a number of lots of birds and the corresponding 
standard errors of those averages. 

It was very apparent that some methods for computing such values, in com- 
mon use at the present time, do not give satisfactory results. This is due to the 
fact that the fertile eggs produced by different birds vary considerably with 
respect to hatchability as well as with respect to number of eggs available for 
incubation. It seems reasonable to suppose that the variability in hatch- 
ability of a number of fertile eggs, produced by a given number of birds, should 
obey the Lexis law of dispersion. This supposition is based on two hypotheses: 

(a) The probability that a fertile egg will hatch is constant for all fertile eggs 
produced by the same bird during the time interval under consideration. 

(b) The probability that a fertile egg will hatch varies from bird to bird. 

The reader familiar with the principles of genetics may question the validity 
of the first of these hypotheses. The probability that a fertile egg will hatch is 
largely governed by the genes carried by the chromosomes of the ovum of the 
hen and the sperm of the male bird which fertilized that ovum. The kinds of 
genes carried by various ova and spermatozoa are not necessarily the same, even 
when those ova and spermatozoa are produced by the same female and male 
birds, respectively. However, if we have a sample of a number of fertile eggs 
produced by the same hen, we are justified in assuming that the proportion of 
those eggs which will hatch is constant, except for sampling fluctuations, when 
successive samples of fertile eggs produced by the given hen are incubated, pro- 
vided, of course, that the eggs in the successive samples were fertilized by the 
same male bird or birds. The limit approached by the proportion of fertile eggs 
which hatch as the number of fertile eggs produced by the given hen becomes 
infinitely large may be defined as the probability that a fertile egg produced by 
that hen will hatch. It will be recognized that this definition is based on purely 
academic considerations, since there are physical limitations to the number of 
fertile eggs which a hen can produce in a given period of time. Hypotheses (a) 
and (b) are to be interpreted in the light of this definition of the probability that 
a fertile egg produced by a given bird will hatch. 

Let s1, Se, --- 8, represent the numbers of fertile eggs produced by n birds 
during a period of time and let fi, fo, --- f,, respectively, represent the numbers 
78 





PROBLEM INVOLVING LEXIS THEORY OF DISPERSION 


of chicks obtained from those eggs when the eggs are incubated. Let p; 


represent the hatchability of the fertile eggs produced by the k*" bird. 
The squared standard error of p; is given by the Lexis formula:! 


PQ sk — 1 . 
i a (P, — P)? ql 
Pk Sk NS}, a ) 
in which the P, represent the respective probabilities that the fertile eggs pro- 
duced by the n birds will hatch, P is the arithmetic mean of the P,;, and Q is 
equal to 1 — P. 
The values of the probabilities, P,, are not known. However, as a first 
approximation to equation (1) we may write: 


2 mg, &—1 : — 
Tp, — Sk + NSE 2 (p: ) (2) 


in which p is the arithmetic mean of the p; and q is equal to 1 — p. 
The product, pq, can be accepted as a reasonably close approximation to the 


product, PQ, but the expression, >> (p, — p)?, will, in general, be greater than 
t=1 


the expression, >> (P; — P)?. The reason for this is apparent when we con- 
t=1 


sider that if each of these two expressions is divided by n, the former yields an 
estimate of the squared standard deviation of the p, while the latter yields an 
estimate of the squared standard deviation of the P,. The standard deviation of 
the p, will, in general, be greater than that of the P; because the p, are more or 
less imperfect estimates of the P, and are, therefore, subject to sampling errors 
from which the P; are free. 


We may write: 
1 n 1 n 
a Dy Pi PP =D) (me — vy — 0% 8) 


t=1 t=1 


in which o? is an appropriate correction as yet undefined. 
Since the p, would approach the P, as statisticai limits if each of the s, were 
made extremely large, it follows that ¢2 must approach zero as each of the s; 


approaches infinity. Furthermore, if P; = P, = --- P, = P, we must have: 


lw 
— >) (ap) — 02 =0 or 


t=1 


n 

2 1 
= = > (pe — p)?. (4) 

n 

t=1 
! The formula as given in this paper is a modification of that given by Rietz, H. L. (1927) 
in his book, Mathematical Statistics, Open Court Publishing Co., Chicago, which was 
necessary in order to make it applicable to relative frequencies. 








80 WALTER A. HENDRICKS 


These conditions suggest that o? be defined by the equation: 


A.8 >. -. (5) 
n ot & 
If 2 is so defined, it will obviously approach zero as each of the s, approaches 
infinity. Furthermore, it has been shown by Yule? that if we have a series of n 
relative frequencies, such as the p, under discussion, based on n samples of 
unequal size, and the probabilities of the occurrence and non-occurrence, 
respectively, of the particular event under consideration are constant from 
sample to sample, the squared standard deviation of those relative frequencies 
is given by a relation such as that used to define o% in equation (5). There- 
fore, the second condition is also satisfied. o% may be interpreted as repre- 
senting that part of the squared standard deviation of the p, which is due to 
the unreliability of the p,; as estimates of the P;. 
Therefore, it seems reasonable to write: 


i< .. 8% : 1 
~ 2) (Pi - PY => Dm - w- BD). (6) 


$=] t=1 t=1 


Combining equations (1) and (6), we obtain the following formula for caleu- 
lating the squared standard error of p;: 


2 _ mM, #-llS\(, ay a Sh ‘ 
= M4821 > ow 3], @ 


Since the weight of a measurement is inversely proportional to the square of 
the standard error of the measurement, we are now in a position to calculate a 
weighted mean, j, of the p;. 


> wipr 


po A. (8) 
z. We 
t=1 
in which: 
1 ‘ 
UW: = a (9) 
Cc 


Pt 











The squared standard error of } is given by the familiar formula: 
D wilp — p)? 
_ t=1 . 
(n—-1) Dw 
t=1 


2 
o-; 


(10) 


8 


2 Yule, G. Udny, 1927. Introduction to the Theory of Statistics, Charles Griffin and 
Co., London. 
















PROBLEM INVOLVING LEXIS THEORY OF DISPERSION 81 


It would seem that p may be accepted as a good estimate of the average 
hatchability of the fertile eggs produced by the given lot of birds, and that 
equation (10) may be used to obtain a valid estimate of the reliability of p. 

However, the problem is not quite so simple. In the first place, there is 
usually a small amount of positive correlation between the number of fertile 
eggs produced by a bird and the hatchability of those eggs. Secondly, as 
pointed out earlier in this paper, the hatchability of fertile eggs is influenced to 
some extent by the male birds used to fertilize the eggs. The error involved in 
neglecting the correlation between hatchability and number of fertile eggs incu- 
bated does not seem to be of much importance in those practical problems which 
have come to the author’s attention. The effects of differences among the male 
birds may be largely obviated in experimental work by frequently transferring 
male birds from lot to lot during the experimental period. 

The best test of the suitability of a particular formula for calculating the 
standard error of an average is to compare the value of the standard error 
calculated by means of the formula with the corresponding value obtained by 
direct calculation from the distribution of a number of such averages obtained 
under essentially the same conditions. The accompanying table gives the 
standard error of the weighted average hatchability of fertile eggs calculated 
for each of four lots of birds by means of equation (10), together with the corre- 
sponding values obtained from the distribution of averages. The former are 
designated as the “predicted” values and the latter are designated as the 
“observed” values. In the calculation of the ‘‘observed”’ values, the various 


averages were assigned the same weights which were used in the calculation 
of the “‘predicted”’ values. 


Comparison of “‘predicted’’ and “‘observed’’ standard errors of the weighted average 
hatchability of fertile eggs, calculated for each of four lots of birds 


Standard error of p 
Lot 


“Predicted”’ ““Observed”’ 


0.7684 0.0287 0.0327 
0.7115 0.0533 0.0561 
0.6834 0.0355 0.0379 
0.7260 0.0615 0.0674 


The data used in these calculations involved a total of 74 birds, approxi- 
mately equally divided among the four lots, and a total of 2,901 fertile eggs 
which were produced and incubated during an experimental period of 48 weeks. 
The agreement between the “‘predicted”’ and ‘‘observed”’ standard errors of the 
weighted average hatchability for each lot of birds is excellent. However, the 
author’s experience with biological data tends to make him doubt that such 





82 WALTER A. HENDRICKS 


close agreement will always be found when such data are subjected to the 
above treatment. The agreement in the present illustration could be less 
close without indicating that the method of calculating the ‘‘predicted”’ stand- 
ard errors is unsound. 


BUREAU OF ANIMAL INDUSTRY, 
U. S. DEPARTMENT OF AGRICULTURE, 
WasHINGTON, D. C. 





A METHOD FOR DETERMINING THE COEFFICIENTS OF A 
CHARACTERISTIC EQUATION 


Paut Horst 


For the characteristic equation 
— (—1)"(a" — qa"! + Cox”? ae + Ca) 


= (x ae ay) (x _ a2) i (x — Gy 


it is well known that 
where A ; is the sum of all 


(2) 


If n exceeds 3 or 4, the process of calculating all possible principal minors is 
very cumbersome. 


But another more systematic method of calculating the c’s may be adopted. 
Suppose we define 


(p) |} 
- ay, | 


(3) 


and 


(4) 


It may be proved! that 


Ss, = Via’. (5) 
s 


But from Newton’s identities? we have 


Sp + CySp-1 + CoSp-2 +--+ + Cp1Si + pe, = 0. (6) 


1 Muir, L. & Metzler, W. H., “‘A Treatise on the Theory of Determinants,” p. 606, 
§ 650 and 651. 


2 Dickson, L. E., “First Course in the Theory of Equations,’’ p. 134, 4 106. 
83 








84 PAUL HORST 


Newton’s identities are ordinarily employed for calculating the sums of the 
powers of the roots of a polynomial when the coefficients are known. They may 
be employed equally well, however, for calculating the coefficients when the 
sums of the powers are given. Thus by means of equations (5) and (6) the 
coefficients of (1) may be readily calculated. } 
If in (2) aj; = a,x, the calculation of the successive A? values is straight- 
forward. The determinant A is used as a constant multiplier so that 


A-A = A’, A-A* = A*,.-- A-A*™' = A* 


and the multiplication is column by column. That is, 


n 
(1+p) 7 (p) 
k=1 








THE GENERALIZED PROBLEM OF CORRECT MATCHINGS 


By Dwicutr W. CHAPMAN 


A method common to many experimental and testing procedures in psychology 
and education is to require an individual to match, as best he can, members of 
one series of items with members of a second series of quite different items certain 
of which are in some sense true apposites of items in the first series. Thus the 
experimental psychology of personality has often investigated the ability of 
graphologists or laymen to pair samples of handwriting produced by a group of 
persons with, say, character-sketches of these same persons; and the excess of 
correct matchings thus produced over the number to be expected by chance has 
been used as evidence that the expressive movement of handwriting affords 
characteristics diagnostic of personal traits. Fortunately, the excesses experi- 
mentally obtained have often been so large as obviously to exclude the operation 
of chance alone. But many empirical results show small excesses only; and the 
interpretation of such findings has not hitherto been subjected to rigid statistical 
analysis. 

The particular statistical problem resident in this experimental procedure is 
twofold, involving the estimation of the significance of (a) a given number of cor- 
rect matchings produced by one individual, and (b) a given mean number of cor- 
rect matchings produced by a group of individuals working with the same mate- 
rial independently. 

Furthermore, two cases arise in practice: (1) the two series of items are of 
equal length, and each item in either series has a true apposite in the other series; 
or (2) the two series may be of unequal length, in which case the longer series 
contains not only a true apposite for each item of the shorter series, but, in 
addition, a certain number of extra, irrelevant items which cannot be correctly 
matched with any items in the shorter series. I have already given the solution 
to problems (a) and (b) for case (1).!_ But case (1) forms only a corollary of the 
more general case (2), to the solution of which this present paper is devoted. 


(a) The Significance of a Given Number of Correct Matchings Resulting 
from a Single Trial 


Let there be given a series of u z-items, 
1, 2, eee Tt, eee Lu 
and a series of ¢ y-items, 
Yi, Y2, -** Yes 


1 The Statistics of the Method of Correct Matchings, Amer. Jour. Psychol., 46, 1934, 
287-298. 


85 








86 DWIGHT W. CHAPMAN 


Let ¢ S u, and let the first ¢ z-items be in some sense true apposites of the corre- 
spondingly numbered y-items, so that if y; be paired with 2; (j = 1, 2, --- #), 
this pairing will constitute a correct matching. 

The first problem which arises is that of determining the probability that a 
single random arrangement of the ¢ y-items against ¢ of the z-items will result in 
exactly s ( = 0,1, 2, --- ¢) correct matchings. 

We begin by putting the first s y-items in correspondence with their apposite 
x-items. Then the number of arrangements of the ¢ y-items in which only these 
sare correctly matched is the number of arrangements of the remaining ¢ — s y- 
items against the remaining wu — s x-items such that no correct matchings occur. 
With respect to these items, let 


n = the number of all possible arrangements, 


n(Y;) = the number of arrangements such that at least the j*" item is cor- 
rectly matched with its apposite, 


n(Y;Y;,) = the number of arrangements such that at least both the 7 and kt 
items are matched with their apposites, etc.; 


and let 


n(Y;) = the number of arrangements such that at least the 7" item is not 
matched with its apposite, 


n(Y;Y;,) = the number of arrangements such that at least the 7" and k'® items 
are not matched with their apposites, etc. 


We have then to evaluate the expression (YT 012 os eee vA. the number of ar- 
rangements of the items remaining, after setting s of them correctly matched, 
such that no further correct matchings occur. 

Now it can be shown that? 


MY oi: 42 a Y,) = ® 
— [n(Yogs) + (Yous) +--+ + (YI 
4+ [n(¥er¥ers) + n(VeuYers) +--+ (Yay 
— [n(Verr¥ep2Vors) + +++ + 2(Vi2V i Y,)] 
-- eee 
bmi a Min oo FD. 


The value of the expressions on the right side of this equation can be deter- 
mined as follows: 


2H. Whitney, A Logical Expansion in Mathematics, Bull. Amer. Math. Soc., 1932, 572- 
579. 











GENERALIZED PROBLEM OF CORRECT MATCHINGS 87 


The value of n is the number of ways in which ¢ — s items can be arranged 
against 
(u — s)! (u — s)! 


[(u—s)—(t—s]! (—s#! 


The value of the first bracket—the number of arrangements of these items 
such that some one of them is correctly matched—is derived by holding one of 
the items matched, which can be chosen in t — s ways. This leaves t — s — 1 
y-items, which can be arranged against the remaining u — s — 1 z-items in 
(u — s — 1)!/(u — t)! ways. The product of these two expressions gives us for 
the value of the first bracket 


u — s items, which is 








ox, . €~ a) Xe — « — 1)! 
abt «4a =e 
(u — t)! 

To evaluate the second bracket, we hold two of the t — s items matched, which 
can be chosen in (¢ — s)!/[2!(f — s — 2)!] ways. There remains t — s — 2 
y-items which can be arranged against the remaining u — s — 2 2-items in 
(u—s—2)!/(u—t)! ways. The product of these two expressions gives us 

. GG. oie — ¢ — 9 
MY iV ose vee YiuiY,)) = ————._-———— 
itt ee ee Te 
Continuing thus, we develop the following series for the number of arrangements 
of ¢ items against w items such that the first s are correctly matched: 





> + = (u—s)! (¢—s)u-—s—1)! . ( —s)"u —s — 2)! 
Minne Fe (u — t)! T ¢—s—Diu—d! 
(¢ —s)!(u — 2)! 
(¢ — s)!(u — t)! 
In order to express the number of arrangements, N,,.), such that any s correct 


matchings occur, we must multiply the above series by t!/[s!(t — s)!], which is 
the number of ways in which s items can be chosen from ¢ items: 


Minus t! | w= (¢ — s) Nu — ¢ — 1)! 
oO” slté—-stLu—d)! (u — 0)! 


4+... + (—1) 


— oo. + (—]) 


(¢—s)'(u — a 
t= stu -— ott 
And in order to obtain the probability that a single random arrangement will 
result in exactly s correct matchings, we must further divide by u!/(u — t!), 
which is the total number of ways in which ¢ items can be arranged against 
uitems. Calling this probability P,.., we have then 


_ thud! ae (¢—s)!(u—s—1)! 
oO ulsl €— dtL@—d! (u — 0)! 


oes 


(¢— s\!(u — * 
(¢—s!(u-—#!]° 








88 DWIGHT W. CHAPMAN 





Finally, factoring (¢ — s)!/(u — t)! out of all terms in the bracket, the series sim- 
plifies to* 


ae oe — s)! (uy —@ — 1)! (u —s — 2)! 
stu! 





Ol —s! id@—-s—!' We—s—2)! 


i,» U—?d! 
(ee o— a. (1) 


In any practical situation, the significant question is not the probability that 
exactly s correct matchings shall occur, but the probability of s or more correct 
matchings. Obviously 


| = Pw) > Posi1y + ae + Pw . 
whence, by equation (1), 


Pp a (u — s)! _ (fn 4 = 9) (u — s —2)! 
(sor more) = Stull Olt —s)! 1!%@—s—1! 2'¢—s — 2)! 


_, ut! 
i a li ea | 








t! w= (u — ¢ — 9)! 
+Gapmlo~—s—D! i@—s—2! 
eee (u — t)! 
ta eee | 
t! ia — ¢@ — 9)! — (u — é)! 
+ovoml weac Fe oo 
aa econ 
t! Tu —2)! ” 
+ aa “oar | @) 


Or, collecting terms in a form better suited to practical computation from tables 
of factorials and reciprocals, 


t! J(w—s)!] 1 
Fuermen) = 3) as (¢ — s)! ai | 
ate 1 Bs 
. @—s—1!LON%s +1)! 1's! 
aT 1 1 | 
+ Gs TLC FD! We FD! as 


3 In the special case in which the series of z-items and the series of y-items are of the same 
length, whence ¢ = u, equation (1) reduces to 


Po = afar at grt Cpe 
” o! i! 2! g~etl 























GENERALIZED PROBLEM OF CORRECT MATCHINGS 89 


aa eee 


Ol ae saa tial 
Oo! Low! 1@— 1)! * 2 — 2)! 


t—s 1 9/ 
—-+-+(-1) ami}. (2) 


(b) The Significance of a Given Mean Number of Correct Matchings Result- 
ing from n Independent Trials 


A frequent practical situation is that in which interest centers on the signifi- 
cance of the mean number of correct matchings achieved by a group of n indi- 
viduals working independently with the same two series. 

In order to determine the probability that the mean number of correct match- 
ings, §, resulting from n independent trials shall equal or exceed a given value, we 
are required to describe the distribution of the means of samples of size n drawn 
at random from a parent population in which the variable is s(= 0, 1, 2, --- t) 
with relative frequencies Pio), Pa), Pw, --- Pw, given by equation (1). The 
tabulation of this parent distribution follows: 


Table I: Distribution of s 








s Relative frequency (= P,.)) 
0 t! | u! (u — 1)! (u — 2)! (u — 3)! 
Olu! LOW! «ile —1)! ' 26t— 2)! 3e — 3)! 
jn ie 
+... + (— yee 
t! | (u—1)! (u — 2)! (u — 3)! 1 ur | 
' tl ate i -—3i*se-Ft +t-8 ‘ta 1) !0! 
t! | (u — 2)! (u — 3)! 2. (u—?)! 
. sa ee et F 


, t! or | 
tlul|. Oto! 


We now determine the first four moments, 1, ve, v3, and vs, of this distribution 
about the origin s = 0. Since, in general, 


‘ t 
vy, = >. [st X (Relative frequency of s)] = >> st Pw, 
s=0 


s=0 


the tabulation for the computation of any moment is as follows: 





DWIGHT W. CHAPMAN 


Table II: The Computation of the kt" Moment of the Distribution of s 
s*P..) 


= (u — 1)! 
llu!LOl(¢ —1)! 
kt! se ~—- 9)! 

ul LOWE— 2)! 


an ~ %)! 


t ttt! | (u — 2)! 
tlut| Oto! | 


1* yA-1 Dk 
Tn” 


Noting that .* 


(u — 2)! 
1!¢ — 2)! 


(u — 3)! 
i@a3it 


ce 


(9d 


Dk-1 t* 
"a" 


(u — 3)! 
2'(t — 3)! 


, (u—?t)! 
o-o"¢- a | 


t—2 (u — t)! 
+ (9 & ooo | 


(u — 2)! | 
(t — 3)!0! 


tk-1 


Gp and multiplying the terms 


in brackets by these factors, we develop Table III: 
Table III 


1st diagonal 
0 l 
“| ate a 98 


Olol¢ — 1)! 


t! Ee — 2)! _ 
1!0!(¢ — 2)! 


t! X — 3)! 


210'(t — 3)! 


u ES ae | (1 term) 


(¢ — 1)!0!0! 


24 diagonal 
| 
1*-“(y — 2)! 


One — 2)! 


2'-\(u4 — 3)! 
Iil¢ — 3)! 


io! 


+(-) 


34 diagonal 
| 
1*-\(u — 3)! 
O!2"(¢ — 3)! 


tt» diagonal 


Vu — 0)! 
ae Co) evo (t terms) 


k-1 ag ! 
yi-2 ane | (t — 1 terms) 


3*—(u — t)! 
aa ae (t aoe terms) 








GENERALIZED PROBLEM OF CORRECT MATCHINGS 91 


Since each series in brackets is one term shorter than the preceding series, the 
table forms a system of ¢ diagonals. The sum which gives us », may therefore 
be considered as the sum of these diagonals. 

Now, from inspection, it is evident that the general diagonal is of the form 


t!(u — fam gk-l (s — 1)*-1 
(s 


th diag, ‘ Ck winiipiineainanas a ee oe 
sth diagonal = ult — sy! — 10! (— 2!1! 


- 1-1 
ele ae mi 


t!(u — s)! rifs-1 
= ea dy Cen (°; )}: 


But it can be shown‘ that 


ej 
Zz (—1)"(s — r)* (° 2 ') = 0 when & <é. 
r=0 


Whence 


st» diagonal = 0 when <8. 


Therefore », is given simply by the sum of the first k diagonals of Table ITI. 
Or, in general, 


_ tu — Ot ara | 
“k = sit — 1)! L000! 
Hu — yam 7 at | 
T a@ — 2)! Tol ~ on! 


Se Qk att 


ul(t — 3)!L2!0! 1f1! ° o!2! 
L 
tu — a ket (k — 1)*1 
ult —k)!tL(kK—1)'0! (k — 2)N! 
se 1*-1 
fe $M ge ae. (3) 


To this equation we must, of course, add the condition k < ¢ 





4E. Netto, Lehrbuch der Combinatorik, Leipzig, 1901, 249, Formula 17. 








92 


DWIGHT W. CHAPMAN 


Solving now for the first four moments, we have 














y= . ) 
Uu 
n= t[14 =) 
u u—l1 | 
t #—1 . (¢—1)(¢ — 2) the 
_ a, “ro —4* (u — |, 
_t ~t—1 (t — 1)(¢t — 2) eG 8 
seal” E 1 "6 eae * Ge eee 


If now we define, for convenience, 


we have, 
Mean 
Me 


M3 


M4 


From these constants we can determine the skewness and kurtosis of the distri- 
bution of s, 








t 
a=-, 

u 

4 
b= 
oe = 9” 
cata? 

~ 2.2 

{3 
i= 
’ cm 2’ 


for the constants of the distribution of s, 


y= a. 
"n= v} 
a(l + b) — a?, whence o = Va(1 + 6b) — a?. 


= 312 4 2v3 
a(1 + 3b + be) —- 3a2(1 + b) + 2a? 
ve — 4nw3 + Gv? v2 — 3y} . 


a(1 + 7b + 6be + bed) — 4a°(1 + 3b + be) + 6a*(1 + b) — 3a! 


| 
| (5) 
| 
| 


ba 
Be = -—;. 


and — 
Mo 


(6) 





Now it is known that the means of samples of size n drawn from a parent 
population with constants @; and £2 are distributed in such a way that 











GENERALIZED PROBLEM OF CORRECT MATCHINGS 93 


Bicmeans) = Bs and Bo(means) _ 3 + as . (7) 
n n 

Therefore, having determined the beta-constants for the distribution of s, we 

can determine the beta-constants of the distribution of 3, the mean number of 

correct matchings resulting from n independent trials. 

Now when ¢t = u = 4, we have 


a=b=c=de=1l, 
and equations (5) give us for the distribution of s 


Mean = 1, 


whence { 
M3 = Ss | Be = 4 
M4 = 4, 


and therefore, for the constants of the distribution of 5, we have, by equations 


(7), 
1 1 
Bi = -, and Bbo=34+-, 
n n 
which indicates a positively skewed and leptokurtic distribution. The effect of 


increasing u and holding ¢ constant is to increase the skewness, as shown in the 
following table for ¢ = 5: 


t Uu Bi 

5 5 : 
n 

5 6 — 
n 

5 - 1.16 
n 

5 8 “a 
n 

5 9 <o 
n 


The degrees of skewness and kurtosis met with in practical cases of matching 
with any considerable number of judges (n) are such that a Pearson Type III 
distribution curve gives a reasonably good fit to the distribution of mean num- 
bers of correct matchings. If, therefore, we have to determine the significance 





94 DWIGHT W. CHAPMAN 


of any obtained mean number of correct matchings, we may resort to Salvosa’s 
tables® of the area under the Type III curve. 

As a concrete example of the application of this method let us imagine that 10 
judges have arranged 5 character sketches against 8 specimens of handwriting, 
5 of which are true apposites of the sketches. Let the total number of correct 


matchings achieved by this group be 12, whence the mean number per judge is 
1.2. We have, then, 















§ = 1.2, n= 10, 








5, u = 8, whence a= 





We now find the mean, standard deviation, and 6, of the distribution of §, as 
follows: 


The mean of the distribution of § is, by sampling theory, the same as the mean 
of the distribution of s: 


The second moment of the distribution of § is, by sampling theory, times 


the second moment of the distribution of s; whence, by equation (5), 


Standard deviation = \/ A [a(1 + b) — a?] = .243. 


And, by equations (5) and (7), 


8 1 [a(1 + 3b + be) — 3a°(1 + b) 4 2a°}? 
i= = —  —-CSC—C— 


Ss ee 131. 












Now the obtained mean number of correct matchings was 1.2, and the next 
lower number which could have occurred (corresponding to a total of 11 instead 
of 12 for the group of judges) is 1.1. The lower boundary of the class-interval 
whose midpoint is § = 1.2 is therefore 1.15; and it is the area above this boundary 
under the curve of § in which we are interested. 


5L. R. Salvosa, Tables of Pearson’s Type III function, Ann. Math. Statist., 1, 1930, 
191-198. 











GENERALIZED PROBLEM OF CORRECT MATCHINGS 
The deviation of this boundary from the mean of § is 


1.15 — .625 = .525, 


and this deviation expressed in terms of the standard deviation gives 


525 
“= = 2.16. 


Entering Salvosa’s table for the deviation 2.16 and skewness = +/8; = .36, we 
find by interpolation that so good a performance should be expected by chance 
only about 23 times in 1000. 








MOMENTS ABOUT THE ARITHMETIC MEAN OF A BINOMIAL 
FREQUENCY DISTRIBUTION 


W. J. Kirxuam, Oregon State College 


Although the most useful moments of a binomial distribution have been 
derived as a function of the parameters of the generating binomial for any 
binomial frequency series, a generalization of notation and procedure is well 
worth our consideration. The problem attempted in this paper is the calcula- 
tion of the moments about the mean for the general frequency series of Table I. 


TABLE I 


The generalized binomial frequency series 








x (item) f (frequency) 
0 N * nC op’q” 
1 N-,Cyp'g"" 
2 N.-,Cop*q" 
n N-,Crp"¢ 








In calculating the moments of a set of data about any value, it is often found 
convenient to use an arbitrary origin, define the moments about this value, and 
represent the desired moments in terms of those calculated. In the general 


binomial series, the origin of the x’s is found to be the best arbitrary origin. 
These intermediate moments are 


> fe 





oe M, arithmetic mean; 
Li fx? 
aS 5 
N (1) 
= se 





where v; is the 7* moment. 
The moments (y’s) about the mean are easily defined as functions of the v’s 
96 











BINOMIAL FREQUENCY DISTRIBUTION 97 


from fundamental definitions of the y’s. Denoting the 7** moment by ui, we 


have 
m= BIG - 29, 
oe 2 
mn = SSE = MY Lot, (2) 
a 3 
ns = ZL — MO Ly, — Bn + Del, 


In general, 
Mn = Pn — nCwa-ir + nCorn—wy +--+ + (—1)* Cra — Lr}. (3) 
Or, if we let {v}" = v,, we may express the n** moment by a simple notation. 
bn = {u}™ = {v}" — wCilv} ni + aCaly} vi + --- = ({vy} — mu)". (4) 


Solving the equation for {v}, 


7} = tu} +n. 


Raising both sides to the n** power and substituting for the “‘brace’’ notation, 


Vn = Mn 4. nC 1Mn-1"1 a nC oftn—2V2 + en + vy . 


Whence 


Ln = Vn — nC ien—-1"1 sal nC 2n—24 a Se a vy . (5) 


a semi-recursion formula. 

The original formula for yu, contained n moments or variables; and since there 
are only (n — 2) of the yw’s which are of lower order than y,, it is necessary to 
retain v, and »; in (5). Since u; = 0, one term in the expansion of yu, is zero. 
For instance, when n = 5, we have 


3 5 
Ms = Vs — Suan — 10u3v7 — 10yu.7; — 4. 


To calculate yz, it is necessary to calculate the v’s from » to v;. For the 
binomial series, these v’s are 


- o =» # 
v1. = Inpg™! + — a 1) pq * + eS a a pq? + --- 


(n — 1)(n — 2) 
2! 


~1 
= np lo + a5 i rr" + Pe eo > p= 


= np(q + p)"" = np, 


- [1-0 4 == 1) (n — 1)(n — 2) 


a eae et > mp, 













W. J. KIRKHAM 


- «te +2 
v3 = np [a 4+ 2 —— p'qr + 32 a vo + ae + np], 


SSC SSCS HS SH SETS HSCHFSHESOHCSG*EECVESSHCHSSETSCHSCHASESSCSSCECHSSSCHEEH CTO CH HOCH CHC OSCE E®S 






eee eer eee eee eee eee eee eee e eee eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee 


(n _ = ms 2) pq” ip aes +4 mip |. 













In the simplified form of v;, the [ ] is the (k — 1)** moment about —1 of the 
binomial series generated by the binomial (¢ + p)"'. Denoting this [ ] by 
/ ’ 
v;,—-1(n — 1), the v’s can be expressed by the formula 


ve = npy,4(n — 1), (6) 


where v’ is a function of (n — 1) and (k — 1) while »; was a function of n and k. 

Let us see how a v’ in »;, can be defined in terms of the v’s of lower order 
thank. In finding this relationship, a consideration of the two series of Table II 
will be helpful. 


TABLE II 











f x f 


Na-iCop*q?™ 0 Nn-1Cop’q" 
Na-1Cip'q" Na-1Cip'q" 


ene e@eeseoneeeoen . 7 #& ;}$q<«@ @ ee we Oe eee 8s 







a 





n NraCn-A ng n—1 Na-Cn-1 wi? 
Pq Pp 4 










The [ ] in »; for Table I is equal to the (k — 1)" moment of x’ about x’ = 0. 
Or 


v1, Table II, 2’, = vx-1, Table I, = »,_,(n — 1). 












Also vx_1 for x, Table II, is v,_1 for the series generated by (q — p)"". 

The desired relationship between the v’s for the two series of Table II can 
be found by making use of the equations expressing the equality of the y’s for 
x and x’. Dropping the variablé which shows the number of items, the same 
for the two series of Table II, in the notation, we have 
mene moter, ren inl 4 oD, 

i, = Hg = V3 — By, + Qv? = vy — Brgy, + 2y,°, 
vs = vs — By + Qv? + 3yqy, — 2y,°. 


Substituting the value of v, in the right member of v3, 





v3 = Ys — 3va(r1 — 74) + 8n(y — 4)? — (1 — »4)8. 


BINOMIAL FREQUENCY DISTRIBUTION 


In general, 


vy, = VE — KC weal — 94) + 2Covea(y, — 7)? + +» + (—D Mn — )*% (7) 


The formula just derived may be used to define the moments about any 
origin in terms of those about the original zero of the z’s. For our immediate 
° e ° / . 
use, the formula simplifies since »; = 4, + vo = 4+ 1. Then 


Vp = vp t Cwra + Cove-2 + cCovi-s + --> (8) 


By simple analysis we found the value of »; to be np. By the method of 
continuation, we are able to extend the list of v’s to any number. vy’ from (8) 
is used in (6) with n replaced by (n — 1) in the v’s. 


vo 
Vy np. 
ve = npv,(n — 1) = nplr(n — 1) + vo(n — 1)] 
npl(n — 1)p + 1] = n(n — 1)p® + np. 
= npv(n — 1) = nplvo(n — 1) + 2n(n — 1) + vo(n — 1)] 
n(n — 1)(n — 2)p* + 3n(n — 1)p? + np. 


npv;(n — 1) = nplvs(n — 1) + 3ro(n — 1) + 3n(n — 1) + vo(n — 1)] 


np{[(n — 1)(n — 2)(n — 3)p? + 3(n — 1)(n — 2)p? + (n — 1)p] 
+ 3[(n — 1)(n — 2)p? + (nm — 1)p] + 3[(m — 1)p] + 1}. 
n(n — 1)(n — 2)(n — 3)p*t + 6(n)(n — 1)(n — 2)p? + 7(n)(n — 1)p? + np. 
If the order of the terms in the expansion is reversed, v, is an ascending power 

series in p. The pure numerical coefficients in some of these v’s are 
v, = (1) 
ve = (1, 1) 
v3 = (1, 3, 1) 
vs = (1, 7, 6, 1) 
vs = (1, 15, 25, 10, 1) 

(1, 31, 90, 65, 15, 1) 

(1, 63, 301, 350, 140, 21, 1) 


= (1, 127, 966, 1701, 1050, 266, 28, 1). 








100 W. J. KIRKHAM 


In general, 


n n =. 
Vatl = @ = iC, _ (.c. z -C;) ’ 
1 


1 2 
n t-—1 j-1 
62 Gi« § -«)...). 
3 2 1 


Using the foregoing v’s, and the semi-recursion formula, we are able to deter- 
mine the y’s. 


(9) 


m= vi 
= [np + (n)(n — 1)p*] — (np)? 
= np(l — p) 
= npd. 

ee er 


= [np + 8n(n — 1)p? + (n)(n — 1)(m — 2)p4] — 3(mp)[np(A — p)] — Inpk. 
= np + (—3n)p* + (2n)p? = np(1 — 3p + 2p*) 
= np(1 — p)(1 — 2p) 
= npq(q — P). 
us = [np + 7(n)(n — 1)p? + 6(n)(n — 1)(n — 2)p§ + (n)(n — 1)(n — 2) 
(n — 3)p*] — 4(np)(np)(1 — 3p + 2p’) — 6(mp)*(np)(1 — p) — (np) 
= np + (—7n + 3n*)p® + (12n — 6n*)p® + (—6n + 3n*)p* 
= np(1 — 7p + 12p? — 6p*) + 3n*p*(1 — 2p + p*) 
= npq — 6np*q? + 3n*p*¢’. 
us = np(l — 15p + 50p? — 60p* + 24p*) + 10n*p?(1 — 4p + Sp*? — 2p’) 
= (q — p)(npg — 12np*g? + 10n*p*g’). 
we = np(1 — 3lp + 180p? — 390p* + 360p* — 120p*) + 5n*p*(5 — 36p 
+ 83p? — 78p* + 26p*) + 15n*p'(1 — 3p + 3p* — p’*) 
= npg — 30np’4(q — p)? + 25n*p*@ — 130n?p*g? + 15n*p*q’. 
br = np(l — 63p + 602p? — 2100p? + 3360p* — 2520p* + 720p*) 
+ n’p?(56 — 686p + 2590p? — 4270p* + 3234p* — 924p*) + n*p*(105 


— 525p + 945p? — 7385p? + 210p*) 





BINOMIAL FREQUENCY DISTRIBUTION 101 
(q — p)(npq — 6Onp?g? + 360np%¢? + 56n?p*q? — 462n2 pq? + 105n3p%q*) 


ug = np(l — 127p + 1932p? — 10206p*? + 25200p* — 31920p> + 20160p® 
— 5040p‘) + n° pP(1l9 — 2394p + 13895p? — 35700p*? + 46004p*' 
29232p*® + 7308p*) + n®p*?(490 — 3850p + 10990p? — 14770p' 

+ 9520p! — 2380p°) + ntp*(105 — 420p + 630p? — 420p? + 105p*) 


npqg(l — 42pq(38 — 40pq(1 — 3pq))) + Tn? pP7 — 4pq(77 — 261pq)) 


70nF p(T — 34pq) + 10dn4* pty. 
+ pg PY, PY 








ON CERTAIN DISTRIBUTION FUNCTIONS WHEN THE LAW OF THE 
UNIVERSE IS POISSON’S FIRST LAW OF ERROR! 


By Frank M. Weipa 


Introduction. The median, which is that value of a permuted variable 
which has as many observed values on one side of it as on the other, appears to 
be the natural competitor of the arithmetic mean when we are interested in the 
probable or most probable value of an unknown quantity. It is well known? 
that the law of probability, namely, Poisson’s first law of error, which results 
from the assumption that the median is the most probable value of the unknown 
quantity is given by 


lz 


f(x) = fee. (1) 


Little is known about the form of the distribution functions of the more 
important statistics when the law of the “‘Universe’’ is Poisson’s first law of 
&rror. It, therefore, appears to be of interest and importance to enlarge our 
present knowledge of distribution functions by finding certain new ones when 
the variable or variables are defined by (1). 

In this paper we present the following results: (1) We have obtained an 
explicit expression for the distribution of means of samples of n; (2) we have 
obtained an explicit expression for the distribution of differences; (3) we have 
obtained an explicit expression for the distribution of quotients; (4) we have 
obtained an explicit expression for the distribution of standard deviations for 
samples of n; (5) we have obtained an explicit expression for the distribution of 
geometric means for samples of n; (6) we have obtained an explicit expression 
for the distribution of harmonic means for samples of n. 

In our analysis, we have made use of the theory of characteristic functions in 
the sense of Levy.* This theory has been extended to more than one dimension 
by V. Romanovsky? and by E. K. Haviland.’ 8. Kullback,® in his thesis, has 
made further extensions and has applied them successfully to the distribution 
problem in statistics. 


1 Presented to the American Mathematical Society, February 23, 1935. 

2 Brunt, David: ‘“The Combination of Observations,’’ 1923, p. 27. 

3’ Levy, P.: ‘‘Caleul des Probabilités;’’ pp. 153-191. 

4 Romanovsky, V.: ‘‘Sur un théoréme limite du calcul des probabilités,’’ Recueil mathé- 
matique de la Société mathématique de Moscow, Vol. 36, 1926, pp. 36-64. 

* Haviland, E. K.: ‘On the inversion formula for Fourier-Stieltjes transforms in more 
than one dimension,’’ American Journal of Mathematics, Vol. 57, 1935, pp. 94-101. 

® Kullback, S8.: ‘‘An application of characteristic functions to the distribution problem 
of statistics,’’ Annals of Mathematical Statistics, Vol. V, No. 4, pp. 263-307. 


102 











ON CERTAIN DISTRIBUTION FUNCTIONS 103 


The explicit expression for the distribution of arithmetic means of samples of 
nis not new. This law of distribution has previously been obtained otherwise 
by F. Hausdorff’ and by A. T. Craig.* It is inserted here to show the superiority 
and greater power of our method when compared with previous methods and 
for the completeness of our discussion. The other results offered in this paper, 
as far as the writer knows, are new. 


1. The distribution of arithmetic means. Let us consider 


tf 
z| 


fle) =", tuwtedak: (2) 


If we assume that 2, xe, --- , x, are independently distributed and that each 
z,(i = 1,2, --- ,n) isdistributed according to the same distribution law, namely, 
Poisson’s first law of error, then it is fairly easy to see that the characteristic 
function for the law of distribution of means of samples of n is given by 


¢(t) = f. os a} (3) 


Ifu = Diz; (@ = 1, 2, --- , n), then it follows that the distribution function 
of u, namely, P(u), is given by 


"a ' 2). a want n 
Pa a 2 | cnt I ems ix} dt, (4) 
a ao Jo 


which, upon simplification becomes 


gn-l k" a e itu dt 


> Pn dl “ap nanan 
P(u) ro" J_4 (1 —oit)"” 


(5) 

It is readily seen that the poles of the integrand are of the nt” order and are 
those of (1 — ait)". It follows by the well known Residue Theorem of Cauchy? 
that 





gal k” : (iy 1 q" ( e itu 

Pte) an el ain, ; 6) 
Oe ye a a Nd + oity"f pad 6 

If now, we replace u by n| Z|, we will obtain the desired law of the distribu- 


tion of arithmetic means of samples of n which is 


m 9” k*(—1)""'n q" e z - 
P(\z !) — o”1" (n a 1)! at" a 4. ait)” wail (7) 





defined for all values of x on the range (— a < x < a). 
7 Hausdorff, F.: Beitrage zur Wahrscheinlichkeitsrechnung Ko6niglich Siachsischen 
Gesellschaft der Wissenshaften zu Leipzig. Berichted iiber die Verhandlungen Math.- 
Phys. Classe, Vol. 53, 1901, pp. 152-178. 
§ Craig, A. T.: ‘“‘On the distribution of certain statistics,’’ American Journal of Mathe- 
matics, Vol. 54, 1932, pp. 353-366. 
® Macrobert, T. M.: ‘‘Functions of a Complex Variable,’’ 1933, pp. 57, 295. 





104 FRANK M. WEIDA 


A. T. Craig® has given the distribution laws of arithmetic means of samples of 
size 2,3, and 4. These results as well as the results for any n are readily ob- 
tained from (7). 


2. The distribution of differences. Let us assume that the laws of distribu- 
tion of x and y are independent and that they are given respectively by 
|z| ly! 


fle) =e; fly) =2e %; (—~a <z <a), (-a <y <a). 
1 


In this case, the characteristic function of the law of distribution of differ- 
ences (x — y) is given by 











k a it iota oad ~ a ~it{y) —\¥! 
¢(t) = bs f e " aed | e 2 dy. (8) 
O1 J—a Cs J-« 
Performing the operations indicated in (8) and simplifying, we find that 
4kike 1 1 
o(t) = ——. _ “ (9) 
aio, (1 — ot) (1 + ovit) 
It is fairly easy to see that the distribution law of u is given by 
+ db. @ —itu ? 
i i ec eimsss (10) 
210102 a el _ oit)(1 +- oot) 
Now, let {(1/o1) — it} = v/u, then (10) becomes 
OL ib Oy ates RR Sicily ; 
P(u) = — 2a Mh scapininigeel A lacticrenity, (11) 
1410 \03(0, 4- 2) Ba | he nice v 
7 (— ») (* +o «i 
\ 7162 } 
The integral in (1)) is convergent because 
Lim | v” ‘cena attaeans | = 0. 
va v | 
1+ — 
(— ») (« + 9% y 
002 | 
Hence, we find that 
iE (0+) ee 
P(u) =e 2kikee 1 e-’ di a (12) 








ri0\0;(01 + o2) a i+ v 
(— v) 


01 02 
1+ oO | 
0102 








ON CERTAIN DISTRIBUTION FUNCTIONS 105 


which upon simplification becomes 


P(u) = an as W ce 7% ub, (13) 
"2 


o\02(0,; + 02) ° O10 





, 0; + 02 . ° ° 
where W 1 { —— — u} is the confluent hypergeometric function.” 
"9 o 


It is well known that 


1 
, e 2 2k © -t-L+m t\k-3+m 
ee, 2 2 en 
Vv k, m(Z) r(3 me k + m) 0 t ¢ + ‘) . ” 


for all values of k and m and for all values of z except negative real values. 
Clearly, 


™ a+ 





71% 2 * a 
Ww a¢9,\,¢.. e-' dt 
a4 _— r(1) 0 


which, upon simplification becomes 








a,t+eo 

-—— & 

Wo " 2 | =e 12, (14) 
/* 002 
Hence, we now find that 

tm “ee 

Pigd cs negate @ P7192 (15) 
0.(o, + G2) 
If now, we replace u by | x | — | y | , we will obtain the desired law of distri- 
bution of differences which is 
a ;+%% = 

Akiak ee 
P(|z| —|y )) = ——"_, e “*"* (16) 


0104(0, + 02) 


3. The distribution of ratios. We assume that the laws of distribution of 
x and y are independent and that they are given respectively by 


_ 


z ly! 


f(z) =e "1; jy) =e %; (_a<2<a), (-a<y <a). 





10 Whittaker, E. T. and Watson, G. N.: ‘‘A course in modern Analysis,’”’ 1915, pp. 333- 
334. 








106 FRANK M. WEIDA 


Let wu = log|x| — log|y|. The characteristic function of the law of distri- 
bution of quotients is then given by 


a 


p(t) =f o (|x pedef | e °2(ly |)-# dy 
91 J—a Oo J— 








: ' (17) 
= (i | e “12% dx / e “2y-* dy. 
0102 0 0 
Now, let s = 2/o, and w = y/o, then clearly 
g(t) = 4k,k,o toy I e~*s*t ds i e vw dw, 
0 0 
whence 
g(t) = 4k,k,oj'og" Pit + 1)T(1 — it). (18) 
It follows that the distribution law of u is given by 
Akyke ' : ; as ‘ . 
P(u) = 5 em ituti log o,t—ilogost D(7¢ + 1) (1 — zt) dt 
T J-a 
which upon simplification, becomes 
Zhike [* _.,._ ai 
P(u) = — e~i(u-logo, +logeg)t P(G¢ 4+ 1) P(1 — it) dt. (19) 
Tw —a 
Now, let (1 — zt) = —v, then (19) becomes 


Lee —i+ita 
P(u) = thike e—? {u-loge, + loge} — {u—loge, +loge9} [2 + v) P(—v) dv. (20) 


272 l—te 


Since it can be shown that?! 


—-l-+itia 
(1/277) { e-“T'(2 + v) M(— v) dv = (2) {1 + (1/e")} >, 
—]j ; 


ta 


we find that (20) becomes 





P(u) = thee “2 (9) {1 + Ay (21) 
°02 o2€ 
Now, put e" = |x|/|y| = R, whence from (21) we will obtain the desired law 
of distribution of quotients which is 
a_ln ‘ —2 
P(R) = 4kikeoiT (2) } 4 } (22) 
ook ook 





11 Macrobert, T. M., ‘Functions of a Complex Variable,’ 1933, pp. 114, 139, 151. 
Whittaker, E. T. and Watson, G. N., ‘‘A course in modern Analysis,’’ 1915, pp. 283. 











ON CERTAIN DISTRIBUTION FUNCTIONS 107 


4. The distribution of variances and standard deviations. If we assume 
that the variance and standard deviation are calculated about a sample mean 


a-—1i 
and if we let u = >>, 2%, and if the x; are independently distributed and each 
i=1 


z; is distributed according to the same distribution law, namely, Poisson’s first 
law of error, then it is clear that the characteristic function for the law of 
distribution of variances of samples of n is 


le a _lzl - n—1 9k 2. ose. n—1 
g(t) = re e¢ “— ic} = (2! [ /* o ic} ; (23) 
C J—«a og Jo 


Let IJ represent the integral in the right-hand member of (23). We obtain 
1 
that (dI/do) = I/o?, whence J = Ce *. Making use of the conditions: 
a ans 
oa, 1 | et? dx = et” Ve 
0 Vt 
1 
o—a, Ce * —C, whence we find that 


I e” «dr=e iV e° 
0 Vt 
Clearly, it follows that 
(n—-1)ri n—1 
Q™-14"-1¢ 4 3 a 1 
(t) - ‘pais é © (24) 
gs 2 


We now find that the distribution law of u is given by 


(n—1)ri n-1 n—1 


n—lj.n—-1, 4 2 o a@ uu 
P(u) = an ee Ree / . dt. (25) 


210"! 


Evaluating the integral in (25) with a suitably chosen contour,” we find that, 


se 
grb" 26 Qe . 


P(u) = ————_—_—.—_ u 2 e€™. (26) 


n 
Now, let u = >>; 2? = ns*, whence from (26) we will obtain the desired law 
¢=1 
of distribution of variances which is 
ai n—1 


n—l}z,.n—1 2 7 a n—3 n—3 
2 k TT e n 2 (s?) = a s2 


: (27) 
o”? r(*5") 


12 Macrobert, T. M., ‘‘Functions of a Complex Variable,”’ 1933, p. 67. 





P(s?) = 








108 FRANK M. WEIDA 


The law of distributions of standard deviations can be obtained at once from 
(27) since d(s?) = 2s ds. 

We shall now give the specific laws of distribution of variances for samples of 
size 1, 2,3, 4, and 5 when the law of the ‘“‘Universe”’ is Poisson’s first law of error. 
From (27), 








For n = 1, 

P(s*) = 0, (O< s < o), (28) 
Forn = 2, 

ieee tigi 
MA a. (0<s2 <0), (29) 
aos 

For n = 3, 

Pe) = Se, (<8 <x). (30) 
Forn = 4, 

an 
a 0 <8 <x). (31) 


For n = 5, 
A 


80k*xe 7 s%e-* 


P(s?) = : ‘ O<s < @). (32) 
o 


5. The distribution of geometric means. As before, we assume that the 
x; are independently distributed and each x; is distributed according to the same 
distribution law, namely, Poisson’s first law of error. Then, clearly, the charac- 
teristic function for the iaw of distribution of geometric means of samples of n is 


o(t) = tf 9 | x |‘ as} ca =f e 7git ax} ‘ (33) 
ge OF og Jo 
Now, put s = 2/o, then (33) becomes 
$(t) _ {Bk I e- git ish - Wk or {PCat + 1)}*. (34) 
0 


It follows at once that the distribution law of wu is 


Qn kn 





P(u) = etutnlogot (P(t + 1)}" dt. (35) 
Zr J—e 








ON CERTAIN DISTRIBUTION FUNCTIONS 109 





Now, let 7¢ + 1 = —v, then (35) becomes 
_Qnfon —I+ia 
P(u) ie a eutn loge J ev(utn log a) { r( ae v) js dv . (36) 
271 —l-—ia 
It is well known that (10) 
sai NaN 
LON a ee, (37) 
sin” rv{T(v + 1)}* 
Using (37) in (36), we readily find that 
ae kn —l+ia er(utn logo) ( _ 1)” 4” 
P SER cvinagpensnageens utn loge [ tnaiiceshiectbametiaieaes. a ‘ 
(u) ed J-i-ia «{T(v+1)}"sin" xv ” (38) 


It is fairly easy to see that the poles of the integrand in (38) are the poles of 
{(—v)}" and that these poles are of the n‘* order. Applying the well known 
Residue Theorem of Cauchy (8), we find that 


= (—1)ntnati { qn [ er(utn loge) | ; 
P = 2 kn utn loge 5 any ae ; ‘ 
(u) é ~(n — 1)! dy" iT(v + 1)}" i (39) 


Now, since u = log |2,| + log |a.| + --- + log|z, |, then clearly, the dis- 
tribution law of the geometric mean, G, is obtained from the law of distribution 
for u by means of the transformation 


a=0 


u = log (G)". 


Hence, from (39), we find the desired law of distribution of geometric means 
of samples of n which is 


Qn kG o” = qd" Gr o”” 
PT ly ee cas PETE: Nae eee eae ; 
eee ae ‘elroy. = 


6. The distribution of harmonic means. Let us assume that f(r) is the 
law of distribution for x. It is well known" that the law of distribution of 
x’ = 1/z is given by 


F(a’) = (1/x”) f(1/z2’) 


if 1/x is continuous on the range of definition of f(z). Now, in case f(x) is 
Poisson’s first law of error, we find that 


F(z’) = F(1/z) = Fate +; (-as<2x<0), O0<2Sa). (41) 








13 Dodd, E. L., ‘“The frequency law of a function of one variable,’’ Bulletin of the Amer- 
ican Mathematical Society, Vol. 31, 1925, p. 28; ‘“‘The frequency law of a function of vari- 


ables with given frequency laws,’’ Annals of Mathematics, Second Series, Vol. 27, 1925-26, 
p. 18. 





110 FRANK M. WEIDA 


We assume that the x; are independently distributed and each 2; is dis- 
distributed according to the same law of distribution, whence we find that the 
characteristic function for the law of distribution of harmonic means of samples 


of n is 
“k wet " 
¢(t) = —e ¢ x2 dx>, (42) 
0 C 


from which, after simplification, we find that 


kn 27 g2n 
(1 — oit)3"* 


We now find that the law of distribution for u is 


¢(t) = (43) 


PAL k n o2" @ e itu 


P(u) = — ge (i — ott) co oit)s" dt, 


which, after evaluation and simplification, becomes 


( ) Qnkn oe—1 ae. 
P(u) = —— ue *. 44) 
“— @"P(3n) 

Recalling that in this case, u = 1/|a.| + 1/|a22| + --- + 1/|2,]|, we make 
the transformation u = n/H, where H is the harmonic mean; whence, from 
(44), we find that the desired law of distribution of harmonic means of samples 
of n is given by 

Dn kn nenr-h eter as 
P(H) = —~—- + H]!-8e oH, (45) 
o" T'(3n) 

7. Conclusions. We have shown that the same analysis is applicable to find 
the explicit expression for all the distribution laws we have discussed in this 
paper. 


THE GEORGE WASHINGTON UNIVERSITY, 
Wasuinerton, D. C. 





tio’ 
pel 
de’ 


the 
ful 


Er 





ERRATA 


In my paper* there appear two blunders which were called to my attention by A. T. Craig. 

In section 4, pages 107-108, headed ‘‘The distribution of variances and standard devia- 
tions,’’ I have obtained the distribution function of the sum of the squares of n — 1 inde- 
pendent values of z and not the distribution function of the sum of the squares of the 
deviations from the sample mean of the n independent values of z. 

In section 2, pages 104-105, headed ‘‘The distribution of differences,’’ I have obtained 
the distribution function of the differences of absolute values and not the distribution 
function of the actual differences. 


* Weida, F. M., “On Certain Distribution Functions when the Law of the Universe is Poisson's First Law of 
Error,” Annals of Mathematical Statistics, Vol. VI, No. 2, June, 1935, pp. 102-110. 





ry 











ON THE PROBLEM OF CONFIDENCE INTERVALS 


By J. NEYMAN 


When discussing my paper read before the Royal Statistical Society on 19th 
June, 1934, Professor Fisher said that the extension of his work concerning the 
fiducial argument to the case of discontinuous distributions, as presented in 
my paper, has been reached at a great expense: that instead of exact probability 
statements we get only statements in the form of inequalities. 

This remark raises the question whether the disadvantage of the solution 
which he mentioned (the inequalities instead of equalities) results from the un- 
satisfactory method of approach, or whether it is connected with the nature of 
the problem itself. 

I think that the problem is of considerable general interest. For instance it 
may be asked whether the confidence intervals for the binomial distribution 
recently published by E. S. Pearson and C. J. Clopper,! which correspond to 
the probability statements in inequalities, could be bettered. 

The purpose of the present note is to show, (1) that in some exceptional cases 
the exact probability solution of the problem exists and that then it may easily 
be found by the method described in Note I of my paper;? (2) that in the general 
case of discontinuous distribution exact probability statements in the problem 
of confidence intervals are impossible. 

In particular it will be seen that exact probability statements are impossible 
in the case of the binomial distribution and so that the system of confidence 
intervals published by Clopper and Pearson could not be bettered. 

In order to avoid any possible misunderstanding I shall start by restating 
the problem. 

We shall consider a random discontinuous variate x, capable of having one 
or another of a finite, or at most denumerable set of values 


i os 6 See retdeetadanasnasieeeas (1) 


We shall assume that the fréquency function, say p (x | @), of x depends upon one 
parameter 6, the value of which is unknown. The problem of confidence in- 
tervals consists in ascribing to every possible value of x e.g. to rn, (n = 1,2,- - -) 
a “confidence interval,” say (nm) to @2(n) such that the probability, P, of our 
being correct in stating 


Ns oth edeedaaeweewens (2) 
whenever we observe x = x, (n = 1, 2,- - -), is either: 
'\E. S. Pearson and C. J. Clopper: The Use of Confidence or Fiducial Limits in the 


Case of the Binomial. Biometrika Vol. X XVI, pp. 404-413. 
2J.R.S.S. Vol. 97, p. 589. 


111 








112 J. NEYMAN 


(a) equal to a given value a < 1 chosen in advance, or 

(b) at least equal to this value a. 

I proposed to call this chosen value @ the confidence coefficient. 

In the earlier paper I showed that the solution of the problem in its form (b) 
is always possible and easy to find. If the variate x is continuous, then the 
solution of the problem (a) is equally easy. At present we shall consider whether 
and under what conditions the solution (a) is possible when the variate x is 
discontinuous. 

Suppose that the variate x is discontinuous as described above, and that the 
solution of the problem in its form (a) exists and is given by the system of 
confidence intervals (6:(z,), 6e(a,)) forn = 1,2,-.- -. 

The position is illustrated in the diagram below. On the axis of abscissae 
the possible values of the variate x are marked. The axis of ordinates is the 
axis of 6. The confidence intervals are marked on verticals passing through 
corresponding values of z. 














DIAGRAM REPRESENTING THE CONFIDENCE INTERVALS. 





@® MARKS A_ POINT BELONGING 


®, (N) 
TO THE SET OF ACCEPTANCE x(@). 


8. (4) 


Ce. ee ee eae ee ee ee oe ee ee ee ee ee ee 





P is the probability of an event, say E, which we shall describe in some detail. 
Let us denote generally the probability of any event a by P{a}. P{a| b} will 
denote the probability of an event, a, calculated under the assumption that 
another event, b, has already occurred. 

Now 


P = P{E} = the probability that {either (c = 2) and then 6,(1) < @ S @2(1) 




































ON PROBLEM OF CONFIDENCE INTERVALS 


or (x = 22) and then 6,(2) < @ < 6@.(2) 


2686 6 6D 64.6 46'4.9 OOO 6H 616 488 eH OO ORS OS OS 


cP CSCC HESC CROC TO KR OHO O OCOD HOHE HHH KO OB 


= Pix = 1}P{@(1) S 6 S 02(1) | @ = 1)} 

+ Pix = x2}P{A(2) S$ 0 S 62(2) | (x = 22)} 

Do idadeke da enaaenanenhadihinetuaeen 

a > P{x = an}P{0(n) < 0 S 60(n) | (x = Xn) } = eee cece eee ee eee (4) 


The calculation of the probability P in the above form is not convenient, as 
both multipliers in each term of the sum in (4) depend upon the unknown 
probability function a priori of 6. Therefore we shall present P in another 
form, giving to the event E a geometrical interpretation. Let us denote by 
CB the set of all confidence intervals (6:(n), @2(m)), as marked on the plane of 
xand 6. Thus CB will be composed of points with co-ordinates x and 6, where 


= 2, ] 
CE ta celeansccwanaswkicid (5) 


x 
Ai(n) S 6 S 2(n) | 


The set CB will be called the confidence belt. 

Denote by A any point of the plane of x and 6, having any values for its 
co-ordinates. 

It is easily seen that the event, which we denote by E, and the probability 
of which is P = a, consists in the point A belonging to the confidence belt CB. 
In fact the event E occurs if and only if the co-ordinates of A fulfil the condi- 
tions (5). But just these conditions define the points belonging to CB. 

The above circumstance allows us to calculate P by means of a formula which 
discloses its connection with p(z | @). 

Fix any possible value of 6 = 6’ and draw the straight line LL the points 
of which have just this fixed value 6’ for their ordinates. The line LL will cut 
some of the confidence intervals. Denote by X(@’) the set of points of inter- 
section, and by ¢(@) the unknown frequency function of 6. The set X(@) will 
be called the set of acceptance corresponding to the specified value of @. 

The function ¢(@) may be continuous or not. So may be p(z | 6) considered 
as a function of 6. These cases may be treated together if we agree that >> F(6) 

6 


will denote either the sum or the integral of F'(@) extending over all values of @, 
whenever F(@) is integrable. 





114 J. NEYMAN 






Using this notation we may write 


P=P{E} => {60 z (we oy} peel eenendseln (6) 


6 X(6) 
where > denotes the summation over all values of x belonging to X (6). 
x(6) 
From the formula (6) may be deduced the following important proposition. 
The probability P may possess a constant value a, independent of the properties 
of the unknown function (6), if and only if for each 6 
ee EE TD BE Why. canersawibansrevecaakes (7) 


X(6) 


The condition (7) is obviously sufficient to have P = a. 
then we should get from (6) 


In fact, if it is satisfied , 



















P=a 2X SRE oe cea ban Seeaee eee (8) 


since y (¢(@)) = 1 whatever the frequency distribution of @. It is equally 
6 


easy to see that the condition (7) is necessary for having P = a@ whatever the 
function ¢(@). For suppose that for @ = 6, we have 


2d, (p(x OES ha cusndnaKsarsienwes (9) 
X(61 
Then if it happens, that 

(6) =] for 0 = 0; (10) 
and 

¢(6) = 0 for 0 ~ (11) 


the only term in the sum }> which is different from zero will be that corre- 
7 
sponding to @ = 6, and the formula (6) will reduce to 


P= )) (p(z|@)) =B Aa. (12) 


X( 61) 





The original question, whether the solution of the form (a) is possible when 
the variate x is discontinuous is thus put in the following form: is it possible 
to define for every possible value of @ a set of acceptance X(@) such that the 
equation (7) holds good? 

The answer is: in some cases it may be possible, but this depends upon the 
nature of the function p(x] @). It is very easy to invent functions p(z | @) for 
which the equation (7) for a definite value of a holds good, and we may even 
fix in advance the sets of acceptance X(@). However the important question 
is not whether there may exist elaborately invented cases of discontinuous 
distributions where the solution (a) exists, but rather whether this solution 
exists always, or at least whether it exists frequently and in cases which are 
practically important. 
















ON PROBLEM OF CONFIDENCE INTERVALS 115 


This question must be answered in the negative on the basis of the following 
example concerning the most important of the discontinuous distributions, the 
Binomial. 

In fact it will be seen below that if x is a variate following the binomial fre- 
quency law, then whatever the arrangement of the sets of acceptance X(@), 
corresponding to different values of @, the left hand side of the equation (7 ) 
cannot be constantly equal to the confidence coefficient a <1. It will follow 
that in the case of the binomial distribution, the solution of the problem (a ) 
is impossible. 

To prove this we shall consider the variate, x, following the binomial frequency 
law. That is to say we shall assume that x may have values 0, 1, 2,- - - n, 
and that 

p(x | 6) = ——™* __ (1 — aor (13) 

r!(n — x)! 

whileO0 <6@<1. Since the set of possible values which z may have is finite, there- 
fore the set of all confidence intervals must be finite also. It follows that there 
is possible only a finite number of sets of acceptance X(@). Therefore there 
must be at least one set of acceptance, say X°, which will be common to an 
infinite number of values of 6, say 6), 62, - - - 0x, - - - so that for each it will 
be X(6,) = X°®. 

Now 


X(6 


n 


for all these values of 6 = 6, will be the same polynomial in 6 of the order n. 
If it has the same value a for a number of values of 6 exceeding n, it means that 
this polynomial is an absolute constant. Therefore if it were possible to give 
a solution of the type (a) in the case of the binomial distribution, it would be 
possible to construct a sum (14), the terms of which are all different and have 
the form (13), and such that after all possible reductions and simplifications 
all terms involving @ would cancel and we should be left only with one constant 
terma<1. This, however, is impossible, since the only term of the form (13) 
which involves a constant, is the term corresponding to x = 0 


p(0|¢) = (1 — 6)" =1—no+ n(n — 62 


and then this constant is 1. Other terms of the form (13) involve 67 as a multi- 
plier. Therefore there exists only one sum of the form (14) which is an absolute 
constant, but this includes all the terms (13) 


< (p(x | 6)) = 1 


and thus is of no value. It follows that whatever the sets of acceptance X (@) 





116 J. NEYMAN 


the corresponding sum (14) will have values varying with the value of @ and 
hence the solution of the type (a) in the case of the binomial does not exist. 

This, I think, gives the solution of the question raised by Professor Fisher. 
It is clear also that whenever the solution of the type (a) exists, it may be 
found by a suitable choice of sets of acceptance, and thus by the method ex- 
plained in my earlier paper. 

I should like now to raise another question. Past experience shows that the 
general problem of estimation may be formulated in different ways. The form 
of this problem as it appears in Bayes theorem, required for its solution the 
knowledge of the probabilities a priori. 

The form of the same problem treated by R. A. Fisher in his theory of esti- 
mation was solved in terms of a new conception, that of likelihood. 

The problem of estimation in its form of confidence intervals stands entirely 
within the bounds of the theory of probability, without involving any concep- 
tion not already inherent in this theory. In the case of continuous distribution 
the problem also allows the solution (a) entirely independent of the probabilities 
a priori. Now it is shown that the necessity of the solution (b) is bound up 
with the nature of the problem if the distributions are discontinuous. 

My question is: is it possible to formulate the problem of estimation in a 
fourth form, leading to a solution which (1) stands entirely on the grounds of 
the classical theory of probability, and (2) is not depending upon the probabili- 
ties a priori—whatever the conditions of the problem? 





ANALYSIS OF VARIANCE CONSIDERED AS AN APPLICATION OF 
SIMPLE ERROR THEORY 


By Watrer A. HENDRICKS 


The need for an elementary presentation of the methods of analysis of vari- 
ance has been recognized by many investigators in various fields of research. 
A recent monograph by Snedecor (1934) is undoubtedly the most comprehensive 
attempt to satisfy this need which has appeared in the literature relating to 
the subject. Snedecor’s treatment of the subject consists largely of the presen- 
tation of a number of standard types of problems to which the methods of 
analysis of variance are applicable, directions for performing the necessary com- 
putations, and a discussion of the conclusions which may be drawn from the 
data on the basis of the analysis. 

In the opinion of the author of this paper, an elementary presentation of some 
of the theoretical considerations upon which the methods of analysis of variance 
are based would also be of some value. The methods of analysis of variance, 
as given by Fisher (1932), are presented as a natural consequence of intraclass 
correlation theory. However, the essential concepts may be presented in a 
more comprehensible form by the use of simple error theory. 

It seems appropriate to begin such a presentation with a definition of variance. 
If we have an infinite number of measurements of the same quantity, the 
variance of a single measurement is defined as the arithmetic mean of the 
squares of the errors of those measurements. In actual practice, an infinite 
number of measurements can never be obtained. We have instead a sample 
of n measurements, 21, 22, - + - Xn, from which the variance of a single measure- 
ment may be estimated. By referring to any text on the method of least 
squares, it may be verified that the best estimate, S*, of the variance of a single 
measurement which can be obtained from a sample of n measurements is given 
by the equation: 


in which m represents the arithmetic mean of the n measurements. The 
quantity, nm — 1, in the terminology of analysis of variance, is designated as 
the number of degrees of freedom available for estimating S?. 

It is often necessary to estimate S? from a number of different samples of 
measurements. In such cases, the best estimate of S? is obtained by calculating 
the weighted mean of the variances estimated from the individual samples, each 
variance being weighted by the number of degrees of freedom which were avail- 

117 








118 WALTER A. HENDRICKS 


able for its estimation. The number of degrees of freedom upon which such an 
estimate of S? is based is given by the sum of these weights. Such an estimate 
of the variance of a single measurement is often designated as the variance 
“within samples.” 

In one of the simpler applications of analysis of variance, a number of samples 
of measurements are available, and the investigator is required to determine 
whether the magnitude of the quantity measured varied from sample to sample 
or whether all of the measurements may be regarded as having been made upon 
a quantity of the same magnitude. 

An estimate, S?, of the variance within samples may be obtained. Since S? 
is an estimate of the variance of a single measurement, the variance, S?, of the 
arithmetic mean, m;, of the measurements in any one sample is given by the 
equation: 


in which n; represents the number of measurements in the sample. Let there 
be r samples. Then another estimate, S;”, of the variance of the mean, m,, 
may be obtained from the observed distribution of the means, 7m, me, + - - m,, 
by the use of the formula for calculating the variance of a weighted observation 
as given in texts on the method of least squares: 
1 

S? = as [ni(m, — m)? + no(me — m)? + --- + 2-(m, — m)?] ....(3) 
in which: 


ess TS (4) 
ee ee i CCAM ODURS COM W OES 


Equations (2) and (3) yield two estimates of the variance of the mean, m,. 
It is apparent that these two estimates will be equal, within the limits of sam- 
pling fluctuations, if all of the measurements in the r samples were made upon 
a quantity of the same magnitude. If the magnitude of the quantity measured 
varied from sample to sample, S;” will be greater than S?. However, in actual 
practice, the two estimates of the variance of a particular mean are not com- 
pared directly. An equivalent’comparison is made between two estimates of 
the variance of a single measurement. The first of these is nothing more than 
the variance within samples discussed earlier in this paper. The second esti- 
mate, which may be designated by S”, is the value which would have to be 
substituted for S? in equation (2) in order to make S? equal to the value given 
for S;,? by equation (3). It is quite apparent that S’? may be found by the 
use of the equation: 


S? = — [21(m, — m)? + neo(me — m)? + --- + n-(m, — m)*]. ....(5) 








ANALYSIS OF VARIANCE 119 


S” is often designated as the variance “between samples.” A comparison of 
S” with S? is obviously equivalent to a comparison of S;? with S?. 
If S” is greater than S?, a statistic, z, may be calculated: 


1 S” 
z= 9 log. oe wee G Sta ee dds aa ieta dee (6) 


This statistic serves as a useful comparison between S” and S? since its sampling 
distribution is known if all of the measurements comprising the data under 
investigation were made upon a quantity of the same magnitude. The distri- 
bution of z, under these conditions, is given by an equation of the form: 


ke™? 
re (7) 
(ne + Ng) mtn») 


df 


in which n represents the number of degrees of freedom available for estimating 
S’2, and nz represents the number of degrees of freedom available for estimating 
S?. It is apparent from equation (5) that r — 1 degrees of freedom are avail- 
able for the estimation of S’? in the particular problem under discussion. 

When any estimate of the variance of a single measurement is multiplied by 
the number of degrees of freedom available for making that estimate, the re- 
sulting product is known as a “‘sum of squares.”” The additive property of 
the sums of squares and the degrees of freedom contributes much to the elegance 
of the scheme of analysis just presented and is of considerable practical impor- 
tance in problems of a type to be discussed later in this paper. In the case 
of the problem discussed above, the additive property of the sums of squares 
provides that the sum of the “sum of squares between samples” and the ‘‘sum 
of squares within samples”’ is equal to the sum of the squares of the deviations 
of all of the measurements from their arithmetic mean. The additive property 
of the degrees of freedom provides that the sum of the “degrees of freedom 
between samples” and the “degrees of freedom within samples” is equal to the 
“total degrees of freedom” which is nothing more than the total number of 
measurements diminished by unity. 

The methods of analysis presented above may be applied to any study of the 
effects of a number of experimental treatments of the same kind upon the 
magnitude of a measurable quantity. If experimental treatments of more 
than one kind are imposed simultaneously, the effects of each may be studied 
by modifications of those methods. The discussion of those modifications, 
about to be presented in this paper, is limited to data which may be classified 
in an “r X sg’ table, i.e., to studies of the effects of only two kinds of experi- 
mental treatments. More complex problems may be treated by simple ex- 
tensions of the methods presented. 

Consider an “‘r X s” table composed of rs cells, each of which contains a 
number of measurements of some quantity. The magnitude of the quantity 
measured may vary from cell to cell, but the essential conditions under which 
the measurements were made must be the same for all cells. It is also under- 



















120 WALTER A. HENDRICKS 


stood that no cell may be empty. Table 1 is an example of such a table. The 
individual measurements have not been represented. Only the number of 
measurements, n;;, in each cell and the arithmetic mean, m;;, of those meas- 
urements have been indicated. The arguments, a;, represent 7 experimental 
treatments of one kind, and the arguments, b;, represent s experimental treat- 
ments of another kind. The problem to be solved is to ascertain whether or 
not the differences among the experimental treatments of each kind had any 
effect on the magnitude of the quantity measured. 


TABLE 1 


Example of an “‘r X s’’ Table Showing Only the Number of Measurements in 
Each Cell and the Arithmetic Mean of Those Measurements 


b, 





























Mri 
Nri 





Mr4 
Nr 














Nes 








If each cell contains the same number of measurements, the effects of the 
experimental treatments indicated by the arguments, a;, may be studied by 
comparing the variance “between rows” with the variance “‘within cells.”” The 
variance between rows may be calculated by regarding the r rows as r samples 
of measurements and applying an equation of the same form as equation (5). 
The variance within cells may be obtained by calculating the variance of a 
single measurement from the data in each cell separately and taking the mean 
of the resulting values. The effects of the experimental treatments indicated 
by the arguments, 6;, may be studied by comparing the variance “‘between 
columns” with the variance ‘within cells.’’ 

If the degrees of freedom between rows, between columns, and within cells 
are added, the sum will be less than the total number of degrees of freedom 
in the table. If the corresponding sums of squares are added, the sum is likely 
to be less than the total sum of squares. The differences are due to what is 
customarily designated as “interaction between rows and columns.” The 




















ANALYSIS OF VARIANCE 121 


more descriptive term, “differential response,’”’ is sometimes used to designate 
the same factor. The nature of this factor may be investigated by considering 
the effects of the experimental treatments, b;, in each row of Table 1. 

The data in each cell of Table 1 may be regarded as a sample of measure- 
ments. Therefore, the data in any row may be regarded as a set of s samples 
of measurements. By applying an equation of the same form as equation (5) 
to the data in any row, an estimate of the variance of a single measurement is 
obtained from the observed distribution of the means of the cells in that row. 
By calculating the arithmetic mean of the estimates for the r rows, an estimate 
of the variance of a single measurement is obtained from r(s — 1) degrees of 
freedom. This estimate may be designated as the variance “between cells in 
the same row.” 

The variance between cells in the same row measures the average effect of 
differences among the experimental treatments, b;, in individual rows. The 
variance between columns, which was discussed earlier in this paper, is calcu- 
lated from s — 1 degrees of freedom and measures the effect of differences 
among the treatments, b;, on the assumption that the effect of any one treat- 
ment upon the magnitude of the quantity measured was constant for every row. 
The number of degrees of freedom assignable to differential response of the 
various rows to the treatments, b;, is r(s — 1) — (s — 1) or (r — 1) (s — 1). 
The sum of squares due to differential response is given by the difference be- 
tween the sum of squares between cells in the same row and the sum of squares 
between columns. These relations follow from the additive property of degrees 
of freedom and sums of squares. 

It may be observed that precisely the same results would be obtained by 
considering the effects of the treatments, a;, in the various columns of Table 1. 
The degrees of freedom and sum of squares due to differential response of the 
various columns to the treatments, @;, would be exactly equal to the correspond- 
ing values obtained for the differential response of the various rows to the 
treatments, b;. 

Up to this point the discussion has been concerned only with the special case 
in which each cell of Table 1 contains the same number of measurements. As 
a matter of fact, the methods given for the analysis of such data will yield 
correct results when applied to any “r X s’’ table in which the numbers of 
measurements in the cells in every row are proportional to the corresponding 
marginal totals for the columns, and the numbers of measurements in the cells 
in every column are proportional to the corresponding marginal totals for the 
rows. 

When the numbers of measurements in the various cells do not satisfy the 
above condition of proportionality, the distributions of the means of the rows 
and columns may be distorted, and, consequently, the methods of analysis 
described above may yield incorrect results. Efficient methods of analyzing 
such data have been presented by Yates (1933). A comprehensive discussion 
of these methods is considerably beyond the scope of this paper. One method, 





122 WALTER A. HENDRICKS 


described very briefly by Yates (1933) and designated as the ‘‘method of 
weighted squares of means,” appealed to the author as being particularly 
valuable for practical work. No detailed discussion of the method seems to 
be available in the literature. Therefore, the following presentation may be 
of some interest. 

Consider the experimental treatments represented by the arguments, a;, in 
Table 1. It is necessary to find an average value for the magnitude of the 
quantity measured for each row of Table 1. However, this average must be 
of such a type that its value will not be distorted by the unequal numbers of 
measurements in the various cells. The unweighted arithmetic mean of the 
means of the cells in the row seems to be the logical average to use since, within 
the limits of sampling fluctuations, the value of this average will be identical 
with the value which would have been obtained if each cell had contained the 
same number of measurements. The averages for the r rows are: 


1 
Ma = = (mu + mye + +--+ + mis) 


1 
Ma, = (mar + mo + +++ + mes) 


Ma, = ~ (mn a ee ne (8) 


By the law of propagation of error, the variance of any one of these unweighted 
means is given by the equation: 


2 1 ° ° . 
Si, = = i, + Big H +++ te BE) nce cccissacwenes (9) 


° ° ° ° § ~y2 2 ° 
in which Si, is the variance of ma;, and S%,, Si.,- - -, Sj, are the variances of 
Mi, Miz, - + +, Mis, respectively. If S? represents the variance of a single meas- 


urement, equation (9) may be written in the form: 


° 1 1 S? 
Si, = (2434 see ae abenoeasaad (10) 


Nit Ni2 Nn} = 


The value of S*? may be estimated from the individual measurements in the 
various cells. S? is nothing more than the variance within cells, as customarily 
calculated, and may be estimated from the N — rs degrees of freedom within 
cells, in which N represents the total number of measurements in Table 1. 

The variance of a single measurement may also be estimated from the observed 
distribution of the means of the type, m,;._ These means are not of equal weight. 
Therefore, in order to find the variance of any one of them, it is first necessary 
to calculate the weighted mean of the r individual means. Since the weight of 
an arithmetic mean is inversely proportional to its variance, it is evident from 











ANALYSIS OF VARIANCE 123 


an inspection of equation (10) that the weight, p,.;, of a mean, m.;, may be 
found from the equation: 


1 1 1 1 
wa a eon ke kc ccicncessescees (11) 


Pai Ny Ni2 Nis 
The weighted mean, m,, may then be found: 
’ ’ 


si Pa, Ma, + Pa, Ma, + Poe + Pa, a, 
Pa, + Pa, + car + Pa, 


" . . , - 
The variance Sa?, of any mean, m,;, as estimated from the observed distribution 
of means of this type, is given by: 


5° = ae Se [pa,(ma, a Ma)? + Pa;(Mag oi ma) + pone 


Dar — 1) 
+ Pa,(Moa, — Ma)*]. ...... (13) 


By substituting Sa? for Si,, and S? for S*, in equation (10) and solving the 
resulting equation for S?, an estimate, S°, of the variance of a single measure- 
ment is obtained from the observed distribution of means of the type, m.;. It 
is evident that, after making the indicated substitutions, equation (10) reduces 
to the form: 


Ma 








S= z ~ ; [pa,(Ma, — Ma)? + Pa,(Ma, — Ma)? + +++ + Pa,(Ma, — Ma)*]... (14) 

It is interesting to observe that, if the numbers of measurements in the re- 
spective cells were equal, equation (14) would reduce to the formula for calcu- 
lating the variance ‘‘between rows” as customarily applied in analysis of 
variance. 

The two estimates, S*? and S?, of the variance of a single measurement may 
be compared in the usual manner by taking one-half of the natural logarithm 
of the ratio of the larger estimate to the smaller and making use of the tables 
of the values of “z” given by Fisher (1932). When using these tables, it is 
important to remember that S? was estimated from r — 1 degrees of freedom. 

The method of analysis just described may be employed to study the effects 
of differences among the experimental treatments indicated by the arguments, 
b;, on the magnitude of the quantity measured. The unweighted means for 
the s columns are: 


me, = = (mn + ma + see + mn) 


1 
mM, = — (mz + mez + +++ + m2) 


= = (man + an + coe em Mbps) ee reereeereseereves 





124 WALTER A. HENDRICKS 


The weight, p,;, of a mean of the type, m,, may be found from the reiation: 
1 1 1 1 
mm ine hn cece cee (16) 


Pb; nj; N29; Nrj 
A weighted mean, m, may be calculated: 
—_ Pb,M, + Pv.Mb, + ona + Po,™Mb;5 
Po, + Ps, + ay? + Pos 


An estimate, S}, of the variance of a single measurement may be obtained from 
the observed distribution of means of the type, m,;, by the use of the equation: 


Mb 





2 
S3 = —j [p.,(™m, — ms)? + po(m, — m)? + +++ + pr(7m, — m)?]..... (18) 
Sj; may be compared with S? in the usual manner. 

If it is necessary to study the “interaction between rows and columns,” the 
effects of the experimental treatments, b;, may be studied for each individual 
row of Table 1. Consider the distribution of the means of the cells in a row 
designated by the argument, a;. The weight of any one of these means is 
equal to the number of measurements in the cell. A weighted mean, mi.» of 
the s means of cells in the row may be calculated: 

m,. = es Se tS cwishwkeae maton (19) 
. Ni + Ne + °°* + Nis 
The variance, S;, ;, of the mean, m,;, for any cell in the given row, as estimated 
from the observed distribution of means of this type, may be obtained from the 
equation: 





1 
8,5 = n(s — 1) [nin(mia = mi)? + N2(mi2 — m.,,)° + °° 
1] _— 


+ nis(mis — mi,)']. ream (20) 


The variance, S;;, of the same mean, as estimated from the distribution of the 
individual measurements in the cell, may be obtained from the equation: 
9 
2 S? 
Wee ME es A cheeaxecssesesssyiessewes (21) 
‘ Nij 
‘ . ! v2 F . ° 
By substituting S;; for S;;, and Sz,, for S*, in equation (21) and solving the 
° ° 2 42 2 | aca f s 1 
resulting equation for S;,,, an estimate, S;,,, of the variance of a single meas- 
urement is obtained from the observed distribution of the means of the cells 
in the given row. After making the indicated substitutions, equation (21) 
reduces to the form: 


2 
Soyo = 


[nin(mir = ma,)" + Nio(Mi2 = ma;)° = 





1 
s—1l 


+ nimi — ma; Ms asakes (22) 











ANALYSIS OF VARIANCE 125 


Such an estimate, S?,,, of the variance of a single measurement may be 


obtained for each of the r rows in Table 1. By calculating the average, S?,, 
of the variances of the type, in an estimate, S?,, of the variance of a single 


measurement may be obtained from the r(s — 1) degrees of freedom between 
cells in the same row: 


" 
1 , ’ 
ab = r(s — 1) a [na(ma — Maj)” + na(me — ma,)” - 


t=1 


+ nis(me — mi; il feieiecs (23) 


Equation (23) is identical with the formula for calculating the variance between 
cells in the same row as ordinarily applied in analysis of variance. This result 
is a direct consequence of the fact that the unequal numbers of measurements 
in the various cells had no distorting effect on the arithmetic means for indi- 
vidual cells. 

The presence or absence of interaction may be verified by comparing S?, 
with Sj. In general, the actual variance due to interaction can not be obtained 
by the “weighted squares of means” method, for the various sums of squares 
do not possess the additive property when the analysis is made in this way. 
However, the comparison suggested above will yield sufficient information for 
most practical purposes. 

For the special case in which r or s is equal to 2, the actual variance due to 
interaction may be calculated. Suppose r = 2 in Table 1. The following 
method, suggested by Yates (1933), yields an estimate of the variance due to 
interaction from a consideration of the differences, d;, between the means of 
the two cells in each column: 


dy = Mu — Ma 


dz = M2 — Merz 


Ne 
The variance, Sa; of any difference, d;, is given by the equation: 
2 1 1 ‘ 
Si; = (2 + 4) SP icdkiecdnnadinrccadene (25) 
nj N2; 
The weight, p;, of the difference, d;, is given by the equation: 
1 1 1 
tee Be) piickwnrcecinwthesiekenh (26) 


Pj Ni; No; 
The variance of the difference, d;, as estimated from the observed distribution 
of differences, is given by the equation: 
- 1 


f= op ll — DF + rele — o)? + +++ + pale — dL ...27) 





126 WALTER A. HENDRICKS 


in which: 





it pid; + pode + --- + pd, (28) 
Pit peters + Dz 1 4£CECKECD CROC CKO COD 


By means of these relations, an estimate, S3, of the variance of a single measure- 
ment may be obtained from the observed distribution of the differences of the 
type, d;. This estimate represents the variance due to interaction and may be 
obtained from the equation: 


1 9 9 9 
Si= aaa [pi(d, — d)? + pe(d. — d)? + --- + p.(d, — d)’]. ....(29) 
It is quite apparent that s — 1 degrees of freedom are available for the esti- 
mation of the variance due to interaction in this particular example. 


REFERENCES 


Fisuer, R. A., 1932. Statistical Methods for Research Workers, 4th edition. Edinburgh 
and London: Oliver and Boyd. 

SNEDEcOR, GEORGE W., 1934. Calculation and Interpretation of Analysis of Variance and 
Covariance. Ames, Iowa: Collegiate Press. 

Yates, F., 1933. The principles of orthogonality and confounding in replicated experi- 
ments. Jour. Agr. Sci., 23: 108-145. 


BUREAU OF ANIMAL INDUSTRY, 
U.S. DEPARTMENT OF AGRICULTURE, 
WasHINGTON, D. C. 











NOTE ON THE DISTRIBUTIONS OF THE STANDARD DEVIATIONS 
AND SECOND MOMENTS OF SAMPLES FROM A 
GRAM-CHARLIER POPULATION 


By G. A. BAKER 


T. N. Thiele in his ‘Theory of Observations” makes the following statement 
with regard to the distributions of the higher half-invariants in samples of n: 
“Not even for we have I discovered the general law of errors.”! The purpose 
of this paper is to shed some light on the distribution of u2 and to give the distri- 
bution of second moments about a fixed point when the sampled population 
can be represented by a Gram-Charlier series. 

The distribution of the second moments about a fixed point of samples is 
given in complete generality. It is known that if the sampled population is 
normal there is a simple relation between the distribution of the standard 
deviations of samples of n and the distribution of the second moments of the 
samples about the mean of the population. It was thought that such a relation 
might exist in case the sampled population could be represented by a Gram- 
Charlier series. Such is not the case. Again, it was thought that by obtaining 
the distribution of the standard deviations for samples of 2, 3, 4, - - - it might 
be possible to deduce empirically a general law of distribution. This proved an 
unfruitful line of investigation but required so much labor that the results 
should be reported to save others time and energy. 

First, suppose that a population may be represented as 





(1) f(x) = aogo(r) + asgs (x) + asys(x) + -- - 
where 
di(e-}=’) 
g(x) = at 


Then applying ‘Theorem II of the author’s paper on “Random Sampling from 
Non-Homogeneous Populations’? we deduce at once the following theorem. 
THEOREM I. The distribution of the second moments about the origin of 
(1) of samples of n drawn at random from a population represented by (1) is 
precisely the same as the distribution of the second moments about the same 
point of samples of n drawn from a population represented by the first term of 


n—2 1 


(1), that is a normal population, and is proportional to x 2 e 2° (loc. cit.) 





1 Thiele, T. N., ““The Theory of Observations,’’ reprinted in the Annals of Mathematical 
Statistics, Vol. 2, No., 2, May, 1931, p. 208. 
2 Metron, Vol. 8, No. 3, Feb. 28, 1930. 


127 





128 G. A. BAKER 


This is not so surprising as it may seem at first if it is remembered that the 
odd subscript terms of a Gram-Charlier series slice off frequencies on one side 
of the mean of aogo(a) and add them onto the other side in the same manner. 

If we suppose that a population is given as 


(2) F(x) = aogo(x) + azy3(x) + asga(x) + --: - 


in the same manner we get the following theorem. 

THEOREM II. The distribution of the second moments measured from the 
origin of (2) of samples of n drawn at random from (2) will be a combination 
of distributions of the type of Theorem I with only even subscript terms con- 
tributing anything. The variations in the component distributions will consist 
of differences in the constant factors and the exponent of x, the estimate of the 


— 2 


= ' n 
second moment. The lowest exponent will be a 


For instance, if 


(3) f(x) = aggo(x) + aszgs(x) + asya(x) 
and n = 2, the estimates of the second moment will be distributed as pro- 


portional to 


| (a + 3)? —_ 12a,(a, a 3)x ate (36a; — 6a) a, + 18a,) 5 


2 A 
— 36a; 31 + 9a; S|. 

Thus, it can be said that we know the distribution of the second moments of 
samples about a fixed point if the sampled population is of the Gram-Charlier 
type in the sense that given the number of terms necessary for an adequate 
representation and the number in the samples we can write down the desired 
distribution. However, this is not a simple matter. Further, if some relation 
existed between the distributions of the second moments about a fixed point 
and the standard deviations of the samples we would know the latter distribution 
also. Such a relation is not apparent for samples of 2 and 3. 

Let us investigate the correlation surfaces of the means and standard devi- 
ations of samples of 2 and 3 drawn at random from a population represented 
by the first few terms of a Gram-Charlier series after the method of Dr. A. T. 
Craig. The distributions of the standard deviations can then be obtained 
immediately by integration. 

Suppose that 


(4) f(x) = aogo(x) + asgs(x) + asga(r) 





3 Annals of Mathematical Statistics, Vol. 3, No. 2, May, 1932, pp. 126-140. 








DISTRIBUTIONS OF STANDARD DEVIATIONS AND SECOND MOMENTS 129 


and that we are considering samples of 2. The probability of the concurrence 
of x, and 22 is 


(5) S (xi) f (x2) 
and 

m= -s+2 
(6) 

i2= s+ 2 


where s is the standard deviation and z is the mean of a sample of 2. By means 
of (6), (5) becomes 


(7) e~" Fa? 4 aoas(—6s2x — 2x3 + 62) 
+ apa,(2s* 4+- 12822? — 128? — 1227? + 6) 
+ a3(—s* + 3stx? + 6st — 3s%x* — 9s? + Ox? — G2 4+ 72°)’ 
+ asa4(2s° — 6s‘x*? — 6stx + 68225 — 12s?x*? + 1882 — 227 
+ 1825 — 4273 + 187) 
+ a} (s* —4s%x? — 1286 + Gstrt + 12stz? + 42s! — 45228 
+ 12824 — 36s°x* — 36s? + x® — 127° + 4227 — 362° + 9)]. 


To find the distribution of s we must integrate from — 2 to « with respect 
to x. Thus, (8) is obtained. 


V/ i | ai + a, a,(2s* = 6s”) + ai(—st + ae 7 7 + =) 


+ 2a,a,8° oe a; (« _ 143° +- = s* si > 3’ + ) |. 


(8) 


If we retain only two terms of (3), i.e. use 
(9) f(x) = Ago (x) a 393(2) 
and consider samples of 3 we obtain as the correlation surface of x and s 


9 
187 ete at _ 29M 


sé 0 4 


oni (—40z* + 247s” — 247) 
V3 


9 
a,a; om a 
a -= (—84e° + 52527 s* — 27522‘ ¢* 


(10) +. 576st — 1008x2s? — 288s? — 55862° + 2702 — 17282’) 
3 
é at (288° — 61892?s' — 282s? — 6292° + 288s' + 1344: 


+ 4608275? — 288s? + 729°) | 








130 G. A. BAKER 


The distribution of s can be obtained as before. The processes involved in 
obtaining (7) and (10) are so complicated that the general rule for writing the 
distribution of s is not apparent. Also, the relation of the distributions of s to 
the corresponding distributions of the second moments about a fixed point is 
not apparent. 

In summary, the general distributions of the second moments about a fixed 
point of samples from a population represented by a definite number of terms 
of a Gram-Charlier series and the distributions of the standard deviations of 
samples of 2 and 3 from the same type of population are given and compared. 
No apparent relation exists between them. 





ON THE FINITE DIFFERENCES OF A POLYNOMIAL 
By I. H. Barxey 
In this paper an apparently new and convenient method of finding the suc- 
cessive finite differences of a polynomial is considered. If operationally * 
o(u + mre) = BE o(u) = (1 + Ani)” $(u) 
then for any polynomial f(x) of degree ‘‘n”’ 
f(x) = pox” + piz™™* + +++ + Dn 
= p(x + a)” + qu(e +a)™*+---4 qn 
E°f(x) = po(x + a)” + pile + a)” ++ +++ Dn 
Af(x) = (pi — Qu)(e + a)" + (pe — qie)(a@ + a)" + - - - + (Pn — Gn). 
Similarly, if fi(z) = Aaf(x), then 
filz) = (pi — qu)(@ + 2a)" + geo(x + 2a)" + + + + + Gon 
E*fi(x) = (pi — qu)(@ + 2a)" + (pe — qi2)(x + 2a)" 4+ - - - + (Dn — Gn) 
Aafi(x) = (p2 — Giz — G22)(@ + 2a)" + + + + + (Dn — Gin — Qn) 


and so on for the higher orders, since Acfs-:1(x) = Ajf(x). In the practical 
application of this method, ‘‘a” may be conveniently taken as unity, and an 
abridged form of synthetic division employed. Thus, if 


f(x) = 5a* + 32° + 72? — 2x + 3, then 


4 34 7s. 91%. Bas 
- $4 0-814 
o» $4088 
— 124 28 
= 
20 — 21425 — 
—~41466'— 77 
— 61 + 127 
— gi 
60 — 102: + 66 =fe 
~— 162: + 228 
— 229 
120: — 162 = f, 
| _ 289 
/ 120 = fy. 
131 








132 I. H. BARKEY 


As is evident from the darkened numerals, all figures to the right of the dotted 
line are redundant and may be omitted. From the above, 


Af(x) = 20(r + 1)? — 21(z + 1)? 4+ 25@ +1) —- 11 
A’f(x) = 60(x + 2)? — 102(x + 2) + 66 

A® f(z) = 120(x + 3) — 162 

A‘f(x) = 120. 





SOME PRACTICAL INTERPOLATION FORMULAS 
° By Joun L. Rosperts 


Sometimes we wish to find by means of interpolation an approximation to a 
particular value of w, in the interval between the known values, wo and wy. 
But it also might be desirable in the interval from wo to w, to interpolate several 
approximations to w, at equidistant values of x. It is very important to know 
that a formula which might be very satisfactory to interpolate a particular value 
in an interval might seriously fail to be the most satisfactory formula when it 
is desired to interpolate several values in the same interval. The range of this 
paper is so limited that we only wish to find by means of interpolation several 
approximations to the true value of w, in the interval from wo to w; at equidistant 
values of x. 

One way to perform an interpolation of this sort is to use osculatory inter- 
polation.! The real function of osculatory interpolation is to secure smooth- 
ness at the known points, which are sometimes called pivotal points. By 
roughness is meant that one or more of the successive derivatives are discon- 
tinuous at the pivotal points. Experience proves that the osculatory formulas 
usually secure smoothness either at the expense of labor or by a loss of accuracies 
over the entire range from wo to w;. Frequently the function of interpolation 
formulas is to save labor. In many cases it appears reasonable to save labor 
by a loss of both smoothness and accuracy. Formulas are herein selected, 
without direct regard for smoothness, so as to secure the best possible compro- 
mise between a maximum of accuracy and a minimum of labor. It appears 
that this results in many cases in a loss of smoothness that is no more objection- 
able than the loss in accuracy. 

The actuarial profession, while trying to perfect their methods of constructing 
mortality tables, have made contributions of a high order of scholarship to the 
theory of osculatory interpolation. But since the statistician, the astronomer, 
the physicist, and other scientists also have occasions to make interpolations, 
it seems to be very important to discuss the problem of finding the most prac- 
tical methods of interpolation, not only from the special viewpoint of the 
actuary, but also from the general viewpoint of mathematics. 

Aw, is called the first difference of w,, and may be defined by Aw, = wz,, — Wz. 


1 Since this paper presupposes certain knowledge on the part of the reader, it may be 
worth while to indicate some sources of this knowledge. The elementary parts of this 
knowledge can be found in any good book on finite differences. ‘‘Population Statistics 
and Their Compilation’? by Hugh H. Wolfenden, published by the Actuarial Society of 
America, contains an excellent summary of osculatory interpolation. This summary 
indicates some valuable sources of information. 


133 








134 JOHN L. ROBERTS 


Second, third, and higher differences are merely successive differences of the 
first. When use is made of central difference interpolation formulas, it is 
convenient to adopt Woolhouse’s notation, which is defined by means of the 
following equations: Aw_2 = a_2, Aw_; = @_1, Auo = a1, Aw; = 2, A®’w_2 = b_, 
A?w_1 = bo, A?wo = bi, A®w_e = c_1, A®w_1 = C1, Atw_2 = do, A®w_2 = €1, ASw_s = fa, 
etc. 

An important family of curves can be represented by 












2 


Assume Up = Wo and Auo = Awo. Then a study of (1) shows that a, which 
has already been defined, must be a factor in the second term in order that (1) 
may be satisfied when x = 1. (1) is a third degree equation. However, if 
C = 0, (1) becomes a second degree equation; if both B = 0 and C = Q, (1) 
becomes a first degree equation. In other words, by giving B and C proper 
values, (1) can be made to become many different interpolation formulas. 

For many purposes interpolation by a first degree formula is not sufficiently 
accurate. We, therefore, might wish to interpolate by either a second or a 
third degree formula. Since it is possible to draw an unlimited number of 
second degree curves or third degree curves between the points Po and P,, the 
problem of selecting the best second degree interpolation curve and the best 
third degree curve is of great practical importance. 


Uz = Uo + 20, + 5 ele —1)B+ (a — D(2 — aye. (1) 














I 





Suppose that w_2, wi, Wo, W1, W2, and ws; can be found in a table of values 
of the function w,, and that we wish to find by means of interpolation several 
approximate values of w, in the interval from wo to w;._ These six given values 
of w, can be used to determine six pivotal points, which determine a fifth degree 
curve. Suppose this curve represents the function v,. Then w, and v, would 
have exactly the same values at the six pivotal points, but would have values 
which are only approximately the same at otber points. Using the first six 
terms of the Gauss central difference interpolation formula, we have 


v, = to + ra, + ane — 1)bo + a (x + 1)z(x — 1)e, 


4 a (x + l)x(x — 1)(@ — 2)d 


‘ a (c + 2)(x + Ix(x — 1)(x — 2)ey. 


It is proper to use in this formula the differences aj, bo, etc., which have already 
been defined as differences of w, because these differences are exactly equal to 
the corresponding differences of v,. Suppose Po, P;, P;, and P; are four points 
















SOME PRACTICAL INTERPOLATION FORMULAS 135 





which are determined by v,. Then B and C can be determined so that (1) will 
represent the curve which can go through these four points. 


Then 
1 1 a “ 
m= w tha —3(B- 4c) 





and 


1 1 4 5 7 
y= wtta—3(m+Sa—-2a—- Za). 









Also 


1 1 
u;= Wo + 5a -3(8+4¢) 


and 





2 1 5 5 8 
y= wt 2a —3(—4+3a-2u- Sa), 


Since u; = v; and uz; = v3, we have two equations, which can be solved for B 
1 } 3 3) q 
and C. 


: €1 (2) 


5 
B=b- 7 dandC=a— 6 


where b and d are defined by 
as 5 (bo + Weeld« 5 (lo + dh). 


A study of (1) shows that u, does not depend upon C because the term con- 
taining C becomes zero when x =} , and also shows that uz over the entire range 
from uo to u; is more sensitive to errors in B than errors in C. The B in (2) 
usually contains some error because the six terms of the Gauss formula which 
were used in determining B usually produce results which are only approximate. 
Consequently a comparatively large error in C would not produce an important 
error. 

Assume 









B=b- = dandC =a - 54. (3) 
B is the same in both (2) and (3), but C is not the same. The accuracy of 
(2) and the accuracy of (3) do not differ by an important amount. On the 
other hand, if any attempt to apply (2) is compared with the working illustra- 
tions of (3) in this article, it will be found that (2) to an important extent is 
more laborious than (3). Therefore (3) is a better compromise between a 
maximum of accuracy and a minimum of labor than (2). For this reason (2) 















136 JOHN L. ROBERTS 


ought not to be regarded as a practical formula. On the other hand (2) because 
of its great accuracy serves as an ideal with which other formulas can be com- 
pared. In other words (2) is of theoretical importance. 

In like manner another interpolation formula can be found if we use the first 
four terms of the Gauss formula to determine P,. Then 


uy = w+ 5a — ZB 


1 1 1 
y= + om -§(% +34). 


Since uw, = v;, we can solve for B, and C is left arbitrary. If C = 0, we again 
get an excellent compromise between a maximum of accuracy and a minimum 
of labor. The following second degree formula results. 


B = bandC = 0. (4) 


In order that the value of (8) and (4) may be appreciated, they are herein 
compared with some other formulas which have been of historical importance. 

If the point P, can first be accurately determined, a second dégree curve 
through the points Po, P,, and P; would probably give more accurate results 
than such a curve through the points Po, P:, and Pe because the first three 
points are in a smaller neighborhood; the second curve can be represented by 
the first three terms of the Gregory-Newton interpolation formula. The points 
P_, Po, Pi, and Pz determine a third degree curve, which can be represented 
by the first four terms of the Gauss central difference formula. It is probable 
that these terms would determine P; much more accurately than the first three 
terms of the Gregory-Newton formula because the latter is not a central differ- 
ence formula with respect to P, and because four terms usually give more 
accurate results than only three terms. Consequently there is a strong prob- 
ability that (4) is more accurate than the first three terms of the Gregory- 
Newton formula. In like manner (4) is more accurate than the first three terms 
of the Gauss formula. It is interesting’ to observe that (4) is the first three 
terms of the Newton-Bessel formula. 


If B’= band C = 3c, 


then (1) is equivalent to Karup’s osculatory interpolation formula in terms of 
differences taken centrally. B is the same in both (4) and Karup’s formula. 
No interpolation formula can be very accurate unless C is about equal to ¢. 
Since, then, the error in C in Karup’s formula is about twice as great as the error 
in C in (4), his formula is distinctly less accurate than (4). Since (4) is a second 
degree curve and Karup’s formula is a third degree curve, his formula is very 
much more laborious. (4) is extremely accurate for a formula having its labor 
saving properties; for many purposes its roughness and inaccuracy appear to 





SOME PRACTICAL INTERPOLATION FORMULAS 137 


be in about the right proportion. On the other hand Karup’s formula is ex- 
tremely inaccurate for a formula so laborious; its only good point is its smooth- 
ness. 


Changing somewhat the meanings of u and w, (3) may be written 


Urtn = Un + TAU, 


1 1 ayy 2 5 4ayy 4 
+ 5 te _ »|3 (A?w, + A*wp1) — 5A (Atw,1 + A ve) | 


1 1 5 
+ 6 a(x — (2 _ 5)( aie — 3 atv») ‘ 


du , 


=~ = Uzins 


dx 


1 5 
54° — j697 


/ / 
Uo+o — U1-1 = 


which is the amount of discontinuity in = at Po. (3) has greater smoothness 
x 


than (4); in other words (3) is more like an osculatory formula. On the other 


hand 


1 1 
B=b6b—-dandC =o — =e 5 
6 1 6 ly ( ) 
which is equivalent to an important osculatory interpolation formula by Mr. 
Robert Henderson, compares much better with (3) from the viewpoint of labor 
saving and accuracy than Karup’s formula does with (4). 


II 


An excellent formula can be easily spoiled if the method of applying it is not 
practical. Mr. Henderson, in the Transactions of the Actuarial Society of 
America, Vol. IX, applies (5) in such a way that the numerical work is very 
convenient. Some writers seem to have been very careless about this matter. 
A method intended to interpolate several values between wo and w; should 
provide that the end value w, shall be exactly reproduced if no error is made in 
the computation. In other words a good method should provide a check upon 
the work. At the same time, in order to avoid unnecessary labor, the work 
should not retain unnecessary decimal places or figures. In other words ficti- 
tious accuracy should be avoided. The following working illustrations are in- 
tended to show good methods of application of formulas and to show how much 
labor is necessary in order to apply them; also the size of the errors can be used 
to illustrate the theory. . 









































138 JOHN L. ROBERTS 






When (4) is applied at either end of the table, where terms are not available 
for the calculation of the differences required, it should be assumed that the 
fourth differences that cannot be computed vanish and the required differences 
should be filled in consistently with that assumption. Aw, represents the first 
differences. But it is convenient to have S represent the first differences in 
such a manner that they are arranged centrally in the working illustration. S? 
in like manner represents the second differences. The 2 in S? means S? is a 
second difference, and does not have the familiar meaning used in algebra. In 
the case of (4), Au, = a; + xB, A*’u, = B, and the higher differences all equal 
zero. Since we wish in the working illustration of (4) to interpolate four values 
between wy and w;, 6 and 6? are defined by 6u, = Uri. — Uz and uz = buUz4.2 
—éu;. It is proved in any good book on finite differences that there are possi- 
bilities that A and 6, which are symbols of operation, can be separated from the 
functions upon which they operate, and they can be treated as if they were 
algebraic numbers. Consequently 1 + 6 = (1+ A)'. In other words by means 
of the binomial law éu, = (.2A — .08A?)u-,, where all the terms within the paren- 
thesis are to be considered as operating upon u,. Also é*u, = .04A’u,-  s, s,, and 
s’ are defined by s = s, = du,, and s* = s*, = &u,. Therefore the middle s = 
du.4 = .2a,, and s? = .04B = .02(b) + b:). We are now in position to apply (4) 
to the case when w, = (1.04)". It might prevent confusion if it is stated that 
z and n are related to each other in such a way that we always interpolate 
between wy and wy. 








n | (1.04)” 





80 23 .050 

81 23.9718 . 9603 

82 | 24.9321 .9988 4.994 .0385 
83 25.9309 1.0373 


84 26 .9682 







85 28.044 1. 

86 | 29.1630 1.1670 

87 | 30.3300 | 1.2150 6.075 .0480 
88 31.5450 1.2630 

89 | 32.8080 1. 













90 34.119 S 
91 35.4826 | 1.4210 | 
92 36.9036 | 1.4784 7.392 0574 
93 38.3820 | 1.5358 
ez * 


94 39.9178 












95 





41.511 











SOME PRACTICAL INTERPOLATION FORMULAS 139 


Some of the explanation of the application of (4) applies to (3) and does 
not need to be repeated. The method herein used of applying (3) is either the 
same as or a development of the Henderson method of applying (5). If it is 
desired to apply (3) at either end of the table, where terms are not available 
for the calculation of the differences required, it can be assumed that the sixth 
differences that can not be computed vanish and the required differences can 
be filled in consistently with that assumption. A study of the theory under- 
lying this assumption shows that it does not result in a true central difference 
formula and that it consequently results usually in some loss of accuracy. In 
the case of (3) before the finding of the differences of (1), it is convenient to 
write it as follows: 


Uz = Uy + 2a, +520 _ v(B +30) + tte — 1)\(x# — 2)C. 


Au; = a +B +30) + ha - a. 


A*u, ¢ + ©) + xC, and A®’u, = C. 


Suppose we wish to interpolate four values between wy and w,. 6 and & 
have already been defined. 6'u, = 62u24.2 — 6’u,. Then 1 + 6 = (1+ A)}, 
or du, = (.2A — .08A? + .048A*)u,. Also du, = (.04A? — .032A*)u, and &u, = 
008A*. s?, s?, and s* are defined by s? = s? = 6’uz_.2, and s* = s*, = &u,. The 


first 
G=Pu.ca ofp —'c)a ode, —- 2a). 
. 5 37 


The last 


oe o4( i. 5) i o4(, ” : is). 


.1852 might be a useful approximation to x The remaining s*, s should be 


filled in so that they are in arithmetical progression with irregularities at the 
ends. If the irregularities can be distributed equally at both ends, the irregu- 
larities cause an error in C, but none in B. Errors in B are more important 
than those in C. The middle s = 6u.4 = .2a; — s*. In the following working 
illustration, w, = sin n. 











L. ROBERTS 















sin ” 





— .86603 





— .50000 13397 
Po 50000 — .13397 

0. 00000 .00000 00000 
| 50000 — .13397 

30 50000 — .13397 03588 
36603 — .09809 

— .23206 





. 86603 












1.00000 


n sin n s s? $3 







0 00000 104498 000000 | 

6 104498 103374 — .001124 
12 207872 101125 2249 — .001125 
18 308997 097751 3374 


.406748 





93252 4499 








. 50000 










— .005624 





Suppose we wish to interpolate nine values between wo and w; by the use of 
(3). Then 6u, = Urya — Uz, OU, = 6Uz41 — Guz, and Bu, = Suzia — 6uz. 
Consequently 1 + 6 = (1 + A)", or du, = (.1A — .045A? + .0285A*)u,. Then 
&u, = (.01A? — .009A*)u, and é’u, = .001A*. s? = s? = &u,, ands? = s*, = 
6°u,. The first 


a 1 5 
Ss = bu_.3 = oi(B — ©) = 01(b _ zs in). 
O1{ b : d 
. = 27 1 . 


du.4 = (.la,; — 4s’) — : 6°u.4 and 6u.5 = (.la,; — 48°) + 5 Ot. 


ne 










The last 









s = Suy = oi(e + 5C) 





SOME PRACTICAL INTERPOLATION FORMULAS 


sin n 8 s? 





00000 | 52318 | .000000 

.052318 | 52179 =| —.000139 

104497 | 51899 280 

156396 | 51478 421 | 

.207874 | .050916 | 562 | —.000141 


258790 050212. 703 
.309002, | 49368 | 844 
358370 | 48383 985 
406753 47257 | 1126 
454010 45990 1267 





50000. | _ 001406 








Suppose we wish to interpolate five values between wo and w;. The first 
5 5 
y= 56 (1 “— in) and the last s? = #4 ( — 4 in). 


bu, = 5 (a — 86uz) — 5 Oty 


bu, = = (a = 85°u,) ot 5 Oy. 


In the following working illustration the given values of sin n are written cor- 
rect to five decimal places; in other words after each decimal point there are 
five symbols or digits representing numbers; also each of these symbols is written 
in the scale of ten. It can be observed that some values of wuz, s, s?, and s* in 
the working illustration have six symbols to the right of the decimal point, and 
that some values have seven symbols to the right of the decimal point. In all 
cases the sixth symbol to the right of the decimal point is written in the scale 
of ten, and the seventh symbol is written in the scale of six. This procedure 
provides a check by exactly reproducing w;. Also this procedure does not cause 
much fictitious accuracy, and can be quickly used after a little practice. 


sin n s s? 33 


. 00000 87130 . 060000 

.0871305 86479 | —.000651 

. 1736104 .0851775 1302 | 

. 2587883 .0832245 1953 | —.000651 
.3420132 80620 2604 

.4226341 77365 3255 


50000 | —.003906 








142 JOHN L. ROBERTS 


In general if we wish to interpolate 7 — 1 values between wp and w; when i 
is neither five nor ten, wi can be exactly reproduced if some of the symbols are 
written in the scale of 7. If i = 12, it is evident that we need two extra symbols, 
say t and e, to stand for ten and eleven respectively. If we wish to interpolate 
7 — 1 values between wy and w; by the use of (4), in the computation each of 
uz, s and s* except the given values should contain one more symbol than each 
given value contains, and the extra symbol should be written in the scale of 7. 





ON EVALUATING A COEFFICIENT OF PARTIAL CORRELATION 
By Grace STRECKER 


It is to be shown here that when the multiple correlation coefficient Ry: 12... 2-1 
is found by the method of Horst? the partial correlation coefficient Rp nt): 12... «n—2) 
ean be found in terms of the #’s. If we are interested only in evaluating a 
partial correlation between two variables, we may also employ the method which 
will be given here. 

Without loss of generality the dependent variables may be chosen to be the 
nth and (n — 1)st. The coefficient of partial correlation as given by Rietz? 
may be expressed in the following form: 


Ronny 
(1) Rig—1y;1 (n—2) = / R (n—1)(n—1ynn Bun 
n(n—1); 12-++-(n—2) = - - tee : 


4/ Ryn 


R (n—1)(n—1) nn 


R .-1)(n-1) May be treated as a new determinant F’. Regarding its elements 
as the coefficients of a set of normal equations (n — 1 in all) whose constant 
terms are zero, we may follow through the Doolittle elimination process. For 
the case where n = 4 we have the table given below. 

In comparing this outline with the one illustrating the Doolittle elimination 
process for R when n = 4 we see that 
Ay 


/ 
Y11 Yu = Pe? 


rA 1122 
? 
R?Ay, 


3 . 3 
/ 
Q33 — 2 Bes = i = > Bis. 
: 7 


= Tn = 


Therefore, we have 


3 
Au TA 
a 
R RA, \ <= o 
I] Vit | 4a — >) Bis 
1 9 


1 Horst Paul, A Short Method for Solving for a Coefficient of Multiple Correlation, An- 
nals of Mathematical Statistics, Vol. III, No. 1, Feb. 1932, pp. 40-44. 
2 Rietz, H. L., Mathematical Statistics, p. 101. 


143 








144 


Reciprocal 1 





~ | 2 
A 11 R2 
P | i 


| 
| 


In the general case: 


GRACE STRECKER 


RPA Arie 


, 3 ; 
as = 2 ies 


/ 

fa "F144 

, 

so Fees 

/ 

n—2 2) = V(n—2)(n—2)s 
n—1 


Q 

7 
- 

~ 


a, | V1 
| 
| 
} ae 
| 61 
/ 
Qo 
/ 
Boo 
| 
/ 
Y2 
ae 
Go 
, 
Qs 
8. 
. 
‘ «ovo 
at 
B33 
/ 
Y3 
iv 
03 











ON EVALUATING A COEFFICIENT OF PARTIAL CORRELATION 145 


Hence 
n—2 n—1 
Ron—1)(n—-1) = R’ _ II Vii (on — Z Bin) 
1 2 
n—2 
Since R = 


II vi, then Ron—1)(n—1ynn = [] 7, from which we see that 
1 1 


n—2 n—1 
‘ TT vs (em — Bm) ne 
(n—1) (n—1) 1 2 
: = Ann — >» Bin 


ri—2 
Roa (n—-1ynn Tl 2 
Vii 
1 


But since @nn = 1, then 








Rn (2-1) .* 


we = 1 >) Bu. 


| a 


It has been shown that 


z=! -» me 


‘ ‘ » R nl wail 
Substituting the above values for —°"-?“" and 


(n—1) (n—1) nn 


= — in equation (1), we have 


/1~ Fou —(1- En) 
VV t+ Fe 





Ran»; 12-+-(n—2) = 





or 


B 
Racn—1; 12--- (n—2) = /— —- + 


1— dX - 


Hence it is seen that when the §’s given by Horst (page 42) are calculated, 
it is an easy matter to solve for the partial correlation Ryn): 12.. 





+ (n—2) « 


Str. Louis UNIvErRsITy, 





A THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 
AND CHECK LISTS! 


By Lee Byrne 
Visiting Professor of Secondary Education, New York University 


Part I. ReseEaRcH Propucts Wuicu May Br CLASSIFIED AS DERIVATIVE 
SPECIFICATIONS AND CHECK LIsTsS 


Meaning of Specification 


In specification something is assigned a specific character. The something 
to be thus assigned a specific character may be called the specificandum. The 
specific character assigned to the specificandum, or (as a second meaning) the 
act of so doing, may be called the specification. 

A proposition is the smallest unit in which it is possible to embody a complete 
thought and is ordinarily represented by a single sentence. In specification 
the characterization may be confined to a single proposition or it may be ex- 
tended to include an indefinitely large number of propositions. So a speci- 
fication may be embodied in a sentence, a paragraph, a chapter, or a whole book. 
No matter how far it is extended it will never give complete determination, as 
our knowledge cannot be made exhaustive or our control be given an absolute 
precision. 

In view of the meaning assigned to specification it is evident that very many 
books and monographs could in this sense be classified as specifications. 


Meaning of Derivative Specification 


There is a type of specification (book or monograph) which is developed by 
deriving it from a group or class of specifications which already exist. This 
class may be a total class of all such specifications, or a group of those accepted 
as authoritative, or a group of those taken to be representative. A specification 
derived in this manner may be called a derivative specification. As an example 
we could take almost any first-class work by a present-day historian; by his- 
torians it would be called “secondary” because it is based on study of pre- 
existent documents called “primary sources.” 


Meaning of Check List 


The act of deriving a product from a pre-existent set of documents may, as 
we have seen, take the form of a derivative specification, embracing an as- 

1 This paper is an amplification of a report made in the statistical section of the Ameri- 
can Educational Research Association at its meeting in February, 1931. 


146 








a OE OSS 








THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 147 


semblage of determinates or determinations. On the other hand the product 
derived may be intended merely to indicate the ground covered or to be covered 
by determination, without actually selecting the particular determinations. 
Such a product will be called a check list. The term is not a very happy one, 
but it is in very common use. If we think of a specification as an assemblage 
of determinations then a check list could be thought of as a corresponding set 
of determinables.? Since any determinable is capable of an indefinite number 
of determinations it is evident that a long check list could give rise to an ex- 
tremely large number of different specifications, of which, of course, some frac- 
tion might prove undesirable, inadmissible, or false. 


Modes of Specification: How We Specify 


If we examine any specification to see how the specifying is done we shall 
find that it ultimately takes the form of specification under aspects. The fol- 
lowing diagram indicates the principal (perhaps all the) possibilities in the way 
of specification. 

Naming the original or main specificandum 
Naming an aspect 
Characterization of the specificandum under the aspect named 


Naming a relation (includes process, operation etc.) 
Naming an aspect of the relation 
Characterization of the relation under aspect named 


_ Naming a relatum or thing related (a new specificandum) 
Naming an aspect of the relatum 
Characterization of relatum under aspect named 


Naming a part 
Naming an aspect of the part 
Characterization of the part under aspect named 


(The naming of aspects may be merely implicit but it is always present in 
principle.) 


2 On the notion of the ‘‘determinable,”’ which is due to W. E. Johnson, see his Logic, 
Cambridge University Press (1921), Part I, p. xxxv and Chapter XI. 








148 LEE BYRNE 


Thus it appears that if specification is pressed far enough it always ultimately 
becomes specification under aspects. Aspect and determinable may be re- 
garded as synonyms. 


Current Examples of Derivative Specifications and Check Lists 


At the present time it will be found that we have very many products of 
research which take forms capable of being classified as some kind of derivative 
specification or (derivative) check list in the senses in which these expressions 
have been explained. 

I have distinguished more than twenty different logical types of derivative 
specification or check list which are exemplified in the current literature of 
educational research and related subjects. However space will not permit 
exhibition of examples of these different types. 


Part II. VALIDATION OF DERIVATIVE SPECIFICATIONS AND CHECK Lists 


Many research products may be classified as derivative specifications or check 
lists, derivative in the sense that they have been derived from a group of docu- 
ments (books, articles, journals, newspapers, courses of study, etc.) through 
analysis of their content. Such source documents themselves we shall call 
specifications or groups of specifications. 

The only validation problem raised here is the question whether the resulting 
check list or derivative specification truly represents the class of source specifi- 
cations used. The further question whether the class of source specifications 
itself constitutes a satisfactory source is not discussed. 

From this point of view, if a check list or derivative specification is based in 
some suitable manner on all the documents of the class represented, no real 
validation problem arises; the validity has to be regarded as perfect. 

It may often happen that the investigator does not wish to analyse all of 
the specifications of the class in question but prefers to save time and labor by 
confining his analysis to a select group drawn from the total class as a sample. 
In this case the problem arises as to how far results based on such sample should 
be judged to be truly representative of the entire class of specifications (most 
of which have not been analysed). A problem of this nature may be called the 
problem of validity for this kind of work. 

Such a validation problem appears to take the same form whether the product 
to be validated is a derivative specification or (derivative) check list. Accord- 
ingly we shall for the sake of brevity carry on the discussion by referring to the 
problem as that of validating (derivative) check lists. The same principles 
would apply if the product happened to be a derivative specification. 

In order to consider the validity of a check list based on a sample group of 
specifications (called here a Sample Check List) we may hypothesize a check 
list based in the same manner on the entire class of specifications from which 
the sample was drawn. Such a hypothetical check list (which is not made) 
will be called the Ideal Check List. Then the problem of validity may be con- 








THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 149 


ceived as the question as to how far the content of the Sample Check List agrees 
with the unknown content of the Ideal Check List. 

An overlapping of the two appears ordinarily to be certain but a failure of 
complete coincidence is very highly probable. The question is what degree of 
coincidence is to be expected. 

This general validity problem naturally divides into two separate questions. 
The first question asks what proportion of the content of the Sample Check 
List may be expected to be present also in the Ideal Check List; this may be 
called the (sub-) problem of reliability. The second question asks what propor- 
tion of the content of the Ideal Check List may be expected to be present in the 
Sample Check List; this may be called the (sub-) problem of completeness. 
The answers to these two problems, if expressed in numerical percentages, could 
be called the Index of Reliability and Index of Completeness respectively. 

We shall first consider these two problems in their simplest form and after- 
ward in a more complex form in which they exhibited themselves in a recent 
study by the writer.2 The simple case presents no great difficulty and it is 
possible that a different method of disposing of it might be preferred. The more 
complex case, however, appears to be rather difficult of solution and the writer 
has not been able to find in the literature any developed technique for handling 
it. The simple case is presented here primarily because it affords, by further 
extension, a successful approach to the difficult problem of the more com- 
plex case. 


Simple Case 
Terms and Symbols 


The “class of specifications” will be understood to consist of all specifications 
which belong to the whole class of specifications regarded as a source, a class 
which we claim to represent in our final product. In this problem the ‘‘class” 
will not be regarded as indefinitely large but as consisting of a definite number 
of specifications, a number to be ascertained by actual count or by careful 
estimate. 

“Sample specifications” are the limited group selected from the class for 
purposes of actual analysis, and which play the rdle of representing the whole 
class. The remaining specifications of the class are not analyzed. 

“Sample Check List Material’ is a name for the assemblage of all the different 
items found in one or more sample specifications. 

“Tdeal Check List Material” is a name for a hypothetical assemblage of all 
the different items found in one or more specifications in the class. Only those 
appearing in some sample specifications can be actually known, the rest are 
hypothetical. 


3 Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 








150 LEE BYRNE 


Write 


M (constant) = total number of specifications in class 

N (variable) = number of these specifications in which a particular item 
under consideration appears (this number is hypothetical and some 
of the particular items themselves are hypothetical) 

m (constant) = number of sample specifications 

n (variable) = number of sample specifications in which a particular 
(the same) item appears 


Values of n may be expected to vary for different items, from m to 0 by inter- 
vals of 1, the zero value appertaining to any item wholly absent from the Sample 
Check List Material (hypothetically present in Ideal Check List Material). 

Values of N might be expected to vary, for different items, from M to 1 by 
intervals of 1. But in this problem the convention will be adopted that the 


: . A ; 
range is from M downward by intervals of = Thus if the number M should 
m 


be five times as large as the number m then the range for N would be treated 
as proceeding from M downward by intervals of 5: M, M — 5, M — 10,- - - 5. 

A “tabulation” will mean a statistical table showing how many different 
items appear in every possible number of specifications. A tabulation must be 
made by actual count for the items of the sample specifications, and will show 
the number of items having each possible value of n. A similar tabulation is 
hypothetical for the items in all the specifications of the class, that is for the 
number of items having each value of N permitted by the convention of the 
last paragraph. 

“Tabulation cell” (or simply ‘‘cell’’) will mean, as needed, either the number 
of items or the group of items appearing in any designated number of specifi- 
cations. For Sample Check List Material it will be the number or group of 
items to which a particular value of n appertains; for Ideal Check List similarly 
the number of items or group of items to which a particular value of N appertains 
(hypothetically). 

“Sample Check List’”’ will mean a list of items selected from the Sample 
Check List Material according to some adopted criterion. For illustrative 
purposes we shall consider this criterion to be, for example, the numerical 
ratio n = > 

“Tdeal Check List”’ will mean a list of items selected from the Ideal Check 
List Material according to some adopted criterion. For illustrative purposes 


. ali ik 7 ‘ M 
we shall consider this criterion to be the numerical ratio N = a 


Problem of Reliability 


The problem of reliability may be restated and renamed the General Reli- 
ability Problem. This may be broken up into a group of problems which will 











THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 151 


be called Elementary Reliability Problems. Each of the latter may be in turn 
broken up into a group of problems which will be called Ultimate Reliability 
Problems. Each Ultimate Reliability Problem may be solved directly. Com- 
bination of these solutions will yield solutions of the Elementary Reliability 
Problems. Combinations of the latter solutions will finally yield the solution 
of the General Reliability Problem. 

These problems will now be stated 

General Reliability Problem: What proportion of the items present in Sample 
Check List may be expected to be present also in Ideal Check List? 

Elementary Reliability Problem: What proportion of the items in a particular 
cell in Sample Check List may be expected to be present also in Ideal Check 
List? 

Ultimate Reliability Problem: What proportion of the items in a particular 
cell in Sample Check List may be expected to be present also in some designated 
cell in Ideal Check List? 

To solve an Ultimate Problem: 

From the Fundamental Theorem in the Theory of Inductive Probability 
(Whittaker, E. T. and Robinson, G. The Calculus of Observations. London: 
Blackie & Son. 1924. p. 305) the solution may be expressed as 


Pa-Des 
=Pp- 





Whittaker and Robinson’s statement of the Fundamental Theorem in the 
Theory of Inductive Probability is as follows (form slightly changed without 
change in meaning): 

“Suppose that a certain observed phenomenon may be accounted for by any 
one of a certain number of hypotheses, of which one, and not more than one, 
must be true: suppose moreover that the probability of the R-th hypothesis, 
as based on information in our possession before the phenomenon is observed, 
is Pg, while the probability of the observed phenomenon, on the assumption of 
the truth of the R-th hypothesis, is p,. Then when the observation of the 
phenomenon is taken into consideration, the probability of the R-th hypothesis is 


Pr-ps 
= Pp 





where the symbol = denotes the summation over all the hypotheses.’ 

It is clear that an Ultimate Reliability Problem is a case falling under this 
Fundamental Theorem. The observed phenomenon is any item occurring in 
any specified cell of Sample Check List, say cell n = s. It may be accounted 
for by a certain number of hypotheses as to its source in the Ideal Check List 


4 For the fundamental position of this theorem in a theory of science and for its proof 
one may also consult Jeffreys, H. Scientific Inference. Cambridge: Cambridge University 
Press. 1931. Chapter II (section 2.34). 








152 LEE BYRNE 


Material; the different cells in the Ideal Check List Material are these different 
hypotheses of origin, hypothetical because we do not know from which one it 
has come but only that it must have come from some one of them; the cell from 
which it actually comes is the true hypothesis, though we do not know which 
one that is. That the origin of the item is in cell N = R is the R-th hypothesis, 
and its probability is written Pr. The probability of the occurrence of the 
phenomenon on the assumption of the truth of the R-th hypothesis is the prob- 
ability that an item in cell N = R will appear in Sample Check List in cell n = s 
and its probability is written p,. As we clearly have in our Ultimate Reliability 
Problem a case falling under the Fundamental Theorem quoted we may accept 
as the required solution of the Ultimate Reliability Problem the formula already 
given in the initial statement: 


This expresses the probability that any item found in Sample-Check-List cell 
n = s comes from (and appears in) Ideal-Check-List-Material cell N = R, or 
it gives the proportion of items found in Sample-Check-List cell n = s that 
may be expected to come from (or appear in) Ideal-Check-List-Material cell 
N= R. 

Meaning of any value of P (say Pz) = the probability that any item, drawn 
at random from those cells of Ideal Check List Material which are possible 
sources of items in Sample-Check-List cell n = s, will happen to be drawn from 
cell N = R. 

Meaning of any value of p (say p,) = the probability that any item in Ideal- 
Check-List cell N = R will also be present in Sample-Check-List cell n = s. 
(Important: this supposition is not equivalent to its converse.) 

Evaluation of Pr: 


number of items in cell N = R 
Pr = — nears pons . 
number of items in all cells which are possible sources of items in celln = s 





For this ratio it is necessary to assume that the shape of the numerical curve 
formed by the group of Ideal-Check-List-Material cells is the same as that of 
the numerical curve formed by the group of Sample-Check-List-Material cells. 
On this assumption we may replace the numerator by the number of items in 
the Sample-Check-List-Material cell having an abscissa corresponding to that 
of the Ideal-Check-List-Material cell N = R, and replace the denominator by 
the sum of the numbers of items in all the cells with abscissae corresponding to 
those of Ideal-Check-List-Material cells which are possible sources of items in 
celln = s. 

Evaluation of p,: 

By the aid of ‘the definition of probability which is used in practically all 
treatises on the subject”? (Coolidge, J. L. An Introduction to Mathematical 








THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 153 


Probability. Oxford: Oxford University Press. 1925. p. 4) and the principle 
underlying the Theory of Combinations (Whitworth, W. A. Choice and Chance. 
New York: G. E. Stechert & Co. 1927. Proposition I1) we are able to arrive 
at the evaluation: 


M—N YN 
i . ~al n 





cu 
in which, for any p (say p,), we employ for N the value N = R, and for n the 
value n = s. As the denominator later cancels out it may be disregarded 


throughout, simplifying the formula to 

p= Cron Cr. 
(A symbol such as C® is read “the number of combinations of N things taken 
n at a time’’; also written in several other forms.) 

The definition referred to may be worded as follows (Coolidge’s own preferred 
definition is not quite the same): 

“An event can happen in a certain number of ways, which are all equally 
likely. A certain proportion of these are classed as favorable. The ratio of 
the number of favorable ways to the total number is called the probability that 
the event will turn out favorably.” 

The principle underlying the Theory of Combinations may be quoted from 
Whitworth as follows (also found in ordinary works on algebra): 

“Tf one operation can be performed in m ways, and then a second can be per- 
formed in n ways, and then a third in r ways, (and so on), the number of ways 
of performing all the operations will be m X n X r X ete.” 

If it is not at once clear that the formula for evaluation of p follows from the 
definition and principle just quoted, the following considerations should make 
it evident. 

We are working in terms of a particular item belonging to a particular Ideal- 
Check-List-Material cell, say cell N = R. ‘Favorable’ occurrence requires 
that this item fall in a particular Sample-Check-List cell, say n = s, while 
falling in any other Sample-Check-List-Material cell (including cell n = 0 for 
absence) is “unfavorable.” Again the real meaning of the “favorable” occur- 
rence is that the item will be found in just n = s out of the m specifications of 
the sample, and absent in the remaining m — n specifications of the sample. 
Moreover presence in Ideal-Check-List-Material cell N = R means that the 
item occurs in just N = R of the M specifications that constitute the whole 
class and is absent in M — N of these specifications. The total number of all 
the ways (favorable and unfavorable) in which our event can happen means the 
same as the total number of all the ways in which a group of m specifications 
can be selected from a larger group of M, and this is, of course, written C¥ and 
given us in our denominator. The number of favorable ways in which our 
event can happen means the same as the number of ways in which N specifi- 
cations containing the item can form groups of n specifications while at the 








154 LEE BYRNE 


same time M — N specifications not containing the item can form groups of 
m — n specifications; the first distribution can be done in C* ways and the 
second in C¥—* ways, so by Whitworth’s principle the number of ways which 
these things can happen simultaneously is C¥—" C*%. Assembling numerator 


m—n 


and denominator we have the formula initially stated for evaluation of p, viz.: 
M—N (1N 
— Cnn ln 
on M 
Cn 
This is the general formula; in applying to the particular example N = R,n = s 
the replacements for N and n, of course, give 


= R c R 


m—s 


Ps = 





Cx 

Having a means of evaluating P and p we may solve all needed Ultimate 
Problems. The resulting solutions of the needed Ultimate Reliability Problems 
(not necessarily completed) enables us to arrive at the solution of any needed 
Elementary Reliability Problem in the form of a percentage which may be 
called an Index of Reliability for the Sample-Check-List cell in question. In 
computing this percentage we distinguish source-cells that belong to the Ideal 
Check List from other source-cells that belong to the Ideal Check List Material 
but not to the Ideal Check List. 

By properly averaging cell-Indices of Reliability (which are really Indices of 
Reliability for the individual items in the cells) we may obtain a solution of the 
General Problem of Reliability in the form of an Average Index of Reliability 
for the Sample Check List as a whole. 

In addition to the Average Index of Reliability for the Sample Check List 
we may easily secure also Average Indices of Reliability for any series of briefer 
Sample Check Lists selected from the Sample Check List, by properly averaging 
the Indices of cells contained in any Sample Check List in question, keeping 
the original criterion for Ideal Check List. 

In practice it may not be necessary to compute all cell-Indices, as a portion 
of these may be entered in tables by any methods of interpolation regarded as 
acceptable. 


Problem of Completeness 


Again we have General, Elementary, and Ultimate Problems. These may 
be stated as follows: 

General Completeness Problem: What proportion of the items present in 
Ideal Check List may be expected to be present also in Sample Check List? 

Elementary Completeness Problem: What proportion of the items present in 
Ideal Check List may be expected to be present also in some designated cell in 
Sample Check List? 

Ultimate Completeness Problem: What proportion of the items in a particu- 
lar cell in Ideal Check List may be expected to be present also in some designated 
cell in Sample Check List? 














THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 155 


To solve an Ultimate Problem: 

From principles already used the proportion to be expected is the same as the 
value of p alone in an Ultimate Reliability Problem, viz.: 

cc. 
co 

By the use of this formula we may solve the Ultimate Problems for all values 
of N represented in Ideal Check List and all values of n represented in Sample 
Check List; some of these solutions will have a value of zero. 

For each value of n, if we properly average the solutions of the Ultimate 
Problems, we obtain a solution of the Elementary Problem for one Sample- 
Check-List cell in the form of a percentage which may be called the Index of 
Completeness for the particular Sample-Check-List cell. In securing this 
average it is necessary to multiply each Ultimate Problem solution by a relative 
number corresponding to the assumed ratio of number of items in the particular 
Ideal-Check-List cell to the number of items in all the Ideal-Check-List cells. 
The source of the assumed relative numbers is the same as that used in evaluat- 
ing P in the Reliability Problem. 

When we have an Index of Completeness for each Sample-Check-List cell 
we may obtain a Total Index of Completeness for the Sample Check List as a 
whole by summing the cell-Indices of Completeness of all the cells of the Sample 
Check List. By an equivalent but preferable method we may divide the last- 
named result by the sum of the cell-Indices of Completeness of all the cells of 
the Sample Check List Material (including cell n = 0); by this method the 
C™ of the original formula cancels out and so may be disregarded throughout. 

A Total Index of Completeness is similarly obtainable for a Sample Check 
List (any Sample Check List selected from the Sample Check List) by summing 
the cell-Indices of Completeness of the appropriate cells. Thus, if desired, a 
tabulation may be made showing Indices of Completeness for a series of Sample 
Check Lists differing in extent. 

A combined tabulation may show for each of a series of Sample Check Lists 
its Index of Reliability and its Index of Completeness. 


More Complex Case 


So far we have considered a validation problem of simple type. In the writer’s 
Check List Materials for Public School Building Specifications’ a more complex 
problem was presented, due to the introduction of the concept of the Applicable 
Case. A Check List for School Building Specifications was developed with a 
view to its use by school officials or others as an aid in judging proposed school 
building specifications with reference to their completeness or incompleteness 
of determination. The position was taken that a new specification ought not 
to be charged with the omission of a given item unless the building (as repre- 


5 Byrne, L. Check List Materials for Public School Building Specifications. Teachers 
College, Columbia University. 1931. 








156 LEE BYRNE 


sented by the specification) had an Applicable Case for that item. To give a 
single example, the Check List contains various items relating to the specifying 
of marble work. It did not seem appropriate to score a specification down for 
the omission of numerous determinations in marble work, if in fact there was no 
marble in the building to be determined. This situation is expressed by saying 
that there are no Applicable Cases for those items. 

It seems likely that there are other research problems in which the question 
ought to be raised whether adequate treatment does not require the introduction 
of the concept of the Applicable Case. If so a more difficult validation problem 
is presented than would otherwise be the case. 

In the more complex case indicated solution is obtained by making the neces- 
sary extensions in the procedures followed for the simple case. 


Modifications in Terms and Symbols 


M (constant) = total number of specifications in class 

D (variable) = number of these specifications containing an Applicable Case 
for a particular item 

N (variable) = number of the latter specifications which also contain the 
particular item 

m (constant) = number of specifications in sample 

d (variable) = number of these specifications containing an Applicable Case 
for the particular item 

n (variable) = number of the latter specifications which also contain the 
particular item 


Values of d range from m to 0 by intervals of 1, and those of n range from d 
to 0 by intervals of 1. 

The convention is adopted that values of D range from M downward, and 
those of N from D downward, by intervals of = ; 


(Tabulation) cell will mean the number of items (or the group of items) having 
a common value of d and a common value of n. 
The criterion for membership in the Sample Check List may, for illustrative 


purposes, be taken as n = > 
The criterion for membership.in the Ideal Check List may, for illustrative 


purposes, be taken as N = = 


Problem of Reliability 


Following the same principle and line of reasoning as for the simple case we 
arrive at the same general formula for the solution of an Ultimate Reliability 
Problem, viz.: 

Pr-ps 





= Pp° 





THEORY OF VALIDATION FOR DERIVATIVE SPECIFICATIONS 157 


Meanings of values of P and p are the same as before except that cells must be 
described respectively in terms of n and d values instead of n values alone, or 
N and D values instead of N values alone. 

Pz is evaluated in the same manner as before, using the new meaning of 
“cell.” 


For p, the evaluation now becomes 


_ Coe cr N 
ou 
which through cancellation may be simplified to the working formula 


Seal em ion a, 


= 


The reasoning leading to the denominator C¥ is unchanged and so this de- 
nominator itself remains unchanged. The numerator for the evaluation of p is 
altered to the extent shown by the consideration that, in producing ‘favorable’ 
ways, we now have to do with the number of simultaneous possibilities of draw- 
ing n specifications from a group of N specifications containing a particular item, 
drawing d — n specifications from a group of D — N specifications which con- 
tain an Applicable Case for this particular item but do not contain this item 
itself, and of drawing m — d specifications from a group of M — D specifications 
which contain no Applicable Case for the item. 


Problem of Completeness 


Following the same principles and line of reasoning as for the simple case we 
arrive at the following formula for the solution of an Ultimate Completeness 
Problem: 


M—D sYD—N : 
m—d“d—n 


c= 
By suitable treatment bringing about cancellations the working formula may 
be reduced to 


Ca oy N ce 


Techniques and Aids in Computation 


The present paper is limited to an attempt to explain with adequate fullness 
the proposed theory of validation for derivative specifications and check lists, 
and space is lacking in which to exhibit techniques of actual computation. One 
specimen problem worked out in fairly complete detail, together with remarks 
on available aids in computation will be found in Appendix A3 in typewritten 
copies of the writer’s ‘‘Check List Materials for Public School Building Specifi- 
cations” on file in the Library of Teachers College, Columbia University; the 
Appendices are not included in the printed edition. 








A NOTE ON SHEPPARD’S CORRECTIONS 
By Sotomon KULLBACK 


In this note we shall derive a simple relation between the characteristic 
function of the grouped distribution and the characteristic function of the 


original continuous distribution, assuming that the frequency curve has high 
contact with the x-axis at both ends. 


s— 


zgt— 
If we set p; = / f(x) dz, then the characteristic function of the 
grouped distribution is given by 


(1) V(t) = Do eit™p, 
where i = ~/— 1. 


Replacing p, by its value as given above, we have 


(2) y(t) = 2, ettzs fle) dx 
= 2 eitzs [i 16 + x.) dx 


Il 
a, 
ols mls 
Qy 
8 
&,, 

5 
s 
— 

8 
8 
Ge 


w 

= 2, ets f(x5) [ e~itz dx. 
_w 
2 


There is no difficulty about justifying the inversion of the order of integration 
and summation. 


Because of the assumption of high-contact with the axis of x at both ends of 
the frequency curve, we have 


(3) g(t) = f et f(x) dx = w >) e'** f(z) 
so that 
(4) vO) = = sin? oo. 


158 








A NOTE ON SHEPPARD’S CORRECTIONS 159 


This is the desired result, from which there follows the desired moment 
relations by equating coefficients of (it)" on both sides of the equation. For 
example: 








a Me ;.r. , Ms pura 7 (it)? w? 1 (it)4wt 1 ) 
14+ Myt + m1 (it)? + 31 (it)? + ++. = (1 + —— + “Te Bl + ++ 
(1 + mit + 5 (it)? + +) 

(it)? w? (it) m, w? 
=1l+mit+ “Or Me + 12 + 3 m3 + 7" + 
or 


mw? - 
ots 





M,=m; Mz = m2, + 5 : M; = m3 + 


WasHINGTON, D. C. 








THE LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS! 
By J. L. Doos 


There have been many advances in the theory of probability in recent years, 
especially relating to its mathematical basis. Unfortunately, there appears to 
be no source readily available to the ordinary American statistician which 
sketches these results and shows their application to statistics. It is the purpose 
of this paper to define the basic concepts and state the basic theorems of prob- 
ability, and then, as an application, to find the limiting distributions for large 
samples of a large class of statistics. One of these statistics is the tetrad differ- 
ence, which has been of much concern to psychologists. 


I 


Let F(x) be a monotone non-decreasing function, continuous on the left, 
defined at every point of the z-axis, and satisfying the conditions 
(1) lim F(x) = 0, lim F(z) = 1. 
Then the function F(z) is said to be the distribution function of a chance variable 
x, and F(z) is said to be the probability that x < x. The curve y = F(z) is 
sometimes called the ogive in statistics. The chance variable x itself is merely 
the function x, taken in conjunction with the monotone function F(z). 

If | xdF (x) exists as an absolutely convergent Stieltjes integral, the value 


°c 


of the integral is called the expectation of x, and will be denoted by E(x). 


II 


Let F(a, - - - , 2n) be a function defined over n-dimensional space, which is 
monotone, non-decreasing, continuous on the left in each coérdinate if the others 
are held fast, and which satisfies the conditions 


(2) lim F(m,---,%.) =0, j=1,---,n, lim F(a, ---,2n) =1 
x j7—@ Liyt**y>F no 
where in the last limit, x, --- ,z, become infinite together. Then F(x, --+ , 2n) 
is said to be the distribution function of a set of chance variables x;, --- , Xn, 
and F(a, --- , Zn) is said to be the probability that all the inequalities x; < 2;, 
(j = 1, --- , n), hold simultaneously. It can be shown that the function 
Fiz) = lim (&,--+ &-1, 2, &, «++ En-1) is of the type discussed in §I. The 
€1,°°° En 1-00 


1 Research under a grant-in-aid from the Carnegie Corporation. 
160 














LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 161 


function F(x) is called the distribution function of x;.. The chance variables 
n 
Xi, °° + ,Xnare called independent if F(a, --- , 22) = [] F(z). The chance 
j=1 
variables x1, - +--+ ,X, are merely the functions 2, - --,2, defined over n- 
dimensional space, taken in conjunction with the function F(a, - - +, 2,). 

If a, - - + ,a, are any real numbers, the number F(a, - - - , a,), the prob- 
ability that x; < a;,j = 1, ---,n, is also called the probability that a sample 
(m1, - + + ,2%,) shall be in the region of n-dimensional space determined by 
4; <aj,j9 = 1,---,n. Thus regions of this special type have probabilities 
attached to them. Using the usual additivity rules, probabilities can be at- 
tached to more general regions, and in fact probability can be defined on a col- 
lection C of regions including all open sets, closed sets and all sets which can 
be obtained from them by repeatedly taking sums, products, and complements. 
(Such point sets are called Borel measurable). The resulting function of point 
sets is non-negative and completely additive.’ 

If f(a, - - + ,2,) is any function of 2, ---,2, let E, be the set of points 
(11, +--+ ,%n) Where f < x. Suppose that E, is in the collection C for all values 
of x, and let F(x) be the probability attached to the set E,. Then it is readily 
seen that F(x) has the properties discussed in §I and is therefore the distribution 
function of a new chance variable x, which will be denoted by f(x:, - - + , Xn). 
The chance variable f(x:, - - - , Xn) is merely the function f(a, - - - ,2,) taken 
in conjunction with the distribution function F(a, ---,2,). (An example is 
f(ai, +++ 5%.) = 4 + --- + 2,, determining the chance variable x; + --- + Xn.) 
Suppose that E(x) exists, 


(3) E(x) = xd F(x). 
Then it can be shown that the n-dimensional (Lebesgue)-Stieltjes integral 


(4) [oo [ples dares, «++ 20) 


exists and has the value E(x). Conversely the existence of the integral (4) im- 
plies that of (3). 
If there is a Lebesgue-integrable function g(a, «+ + ,2%n) such that 


tn z1 
(5) F(ai, +--+ ,%,) = eee | glai, +--+ , tn) dx, --- dra, 
—20 —20 


2 That is, if p(E£) is the value of the set function on the set E, and if E,, E2, --+ are 
i] 2 
point sets with no common points, and which are in C, A> E.) = Z. P(E,,). 


m=1 m=1 











THE LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS! 


By J. L. Doos 


There have been many advances in the theory of probability in recent years, 
especially relating to its mathematical basis. Unfortunately, there appears to 
be no source readily available to the ordinary American statistician which 
sketches these results and shows their application to statistics. It is the purpose 
of this paper to define the basic concepts and state the basic theorems of prob- 
ability, and then, as an application, to find the limiting distributions for large 
samples of a large class of statistics. One of these statistics is the tetrad differ- 
ence, which has been of much concern to psychologists. 


I 


Let F(x) be a monotone non-decreasing function, continuous on the left, 
defined at every point of the z-axis, and satisfying the conditions 
(1) lim F(z) = 0, lim F(z) = 1. 
Then the function F(z) is said to be the distribution function of a chance variable 
x, and F(z) is said to be the probability that x <2. The curve y = F(z) is 
sometimes called the ogive in statistics. The chance variable x itself is merely 
the function x, taken in conjunction with the monotone function F(z). 


If xdF (x) exists as an absolutely convergent Stieltjes integral, the value 


of the integral is called the expectation of x, and will be denoted by E(x). 


II 


Let F(a, - - - , 2,) be a function defined over n-dimensional space, which is 
monotone, non-decreasing, continuous on the left in each coérdinate if the others 
are held fast, and which satisfies the conditions 


(2) lim F(a, ---,%) =0, j=1,---,n, lim F(a, ---,2%2) =1 
— th saingpiah 
where in the last limit, x, --- , 2, become infinite together. Then F(x, --- , tn) 
is said to be the distribution function of a set of chance variables x), --- , Xn, 
and F(x, --- , Zn) is said to be the probability that all the inequalities x; < 2;, 
(7 = 1, --- , n), hold simultaneously. It can be shown that the function 
Fy(x)= lim § (&,--+ &-4, 2, &, «++ En_1) is of the type discussed in §I. The 
goceai aoe 


1 Research under a grant-in-aid from the Carnegie Corporation. 
160 














LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 161 


function F(x) is called the distribution function of x;. The chance variables 


Xi, - + + ,X, are called independent if F(a, --- , 22) = [] F,(2;). The chance 
=< 

variables x1, - + +,X, are merely the functions 2, - - + ,2, defined over n- 

dimensional space, taken in conjunction with the function F(a, - - + , 2n). 

If a, - - + ,a, are any real numbers, the number F(a, - - - , dn), the prob- 
ability that x; < a;,j = 1, --- ,n, is also called the probability that a sample 
(m71,- + +,2,) shall be in the region of n-dimensional space determined by 
2; <aj,j = 1,---,n. Thus regions of this special type have probabilities 
attached to them. Using the usual additivity rules, probabilities can be at- 
tached to more general regions, and in fact probability can be defined on a col- 
lection C of regions including all open sets, closed sets and all sets which can 
be obtained from them by repeatedly taking sums, products, and complements. 
(Such point sets are called Borel measurable). The resulting function of point 
sets is non-negative and completely additive.’ 

If f(a, - - - ,2%n) is any function of x, - +--+ ,2n let E, be the set of points 
(1, --+ ,2%n) where f < x. Suppose that Z, is in the collection C for all values 
of x, and let F(x) be the probability attached to the set E,. Then it is readily 
seen that F(x) has the properties discussed in §I and is therefore the distribution 
function of a new chance variable x, which will be denoted by f(x:, - - + , Xn). 
The chance variable f(x:, - - - , Xn) is merely the function f(m, - - - ,2,) taken 
in conjunction with the distribution function F(a, - - -,2,). (An example is 
S(ai, +++ 5 2n) = 21 +--+ + 2,, determining the chance variable x; + --- + Xn.) 
Suppose that (x) exists, 


oo 


(3) E(x) = xdF (xr). 
Then it can be shown that the n-dimensional (Lebesgue)-Stieltjes integral 


(4) [oo [ple os edaPlen, + 520) 


exists and has the value E(x). Conversely the existence of the integral (4) im- 
plies that of (3). 
If there is a Lebesgue-integrable funetion g(a, - - + ,2,) such that 


(5) F(a, +++ ,2%,) = ar I g(a, -++ ,2n) dx, --- dry, 


2 That is, if p(#£) is the value of the set function on the set EZ, and if Ei, E2, +--+ are 


E.) wo Zz P(E). 
1 


point sets with no common points, and which are in C, a 
m=1 


mm 








162 J. L. DOOB 


the function ¢ is said to be the density function of the distribution. In this 
case (4) becomes 


0o 


(4’) | as Pets, «++ ¢ Sa Ga, >> 9 Be da, +--+ dry. 


The probability attached to a point set E in the collection C is the integral 
(4) (or (4’) if there is a density function), where f = 1 over E and f = 0 else- 
where. 


Il 


Let x, Xi, X2,- - - be a sequence of chance variables. We suppose that for 
every integer n, x, x, determine a bivariate distribution. Then it is readily 
seen from SII that there is a chance variable | x, — x | and therefore that 
P{|x, — x| < \}* is defined for every number j. If 
(6) lim P{|x, —x| SA} =1 

n—0 
for every positive number X, the sequence x, is said to converge stochastically, 
or to converge in probability, to x. If a isa constant, P{| x, — a| S X} is also 
defined for every number \, and there is a corresponding definition of stochastic 
convergence to a. The usual theorems about limits hold: if x,, y, converge 
stochastically to x, y, x. + yn» converges stochastically to x + y, ete. 

An example of stochastic convergence is given by the law of large numbers. 
Let x be a chance variable with distribution function F(x) and suppose that 
E(x), E(x?) exist, i.e. that 


“ rdF (zx) , : xr°dF (xr) 


are absolutely convergent integrals. Let x:, - - - , x, be chance variables whose 


n 
n-variate distribution function is [] F(2;): we are thus supposing that the vari- 
g=1 
ables all have the same distribution and form an independent set. Then 
1 


— > x; is a new chance variable, and Tchebycheff’s inequality furnishes an 
6 Sud 
n ‘ ; 
; : 1 ‘ ‘ 
immediate proof that 5 x; converges stochastically to E(x). 
nN md 


g=e 


3’ Throughout this paper, if y represents a set of conditions on chance variables, P{y} 
will denote the probability that those conditions are satisfied. 


i< = 1 
‘lf x, = : > x;, E(x,) = E(x), E(x.) = ,, BA). Then if \ is any positive num- 
j=1 | 


E\[x — E(x)P} 


ber P{| x, — E(x) | >A} s 
nr? 


which implies (6). 





LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 163 


There is also another kind of convergence, called convergence with prob- 
ability 1. The sequence {x,} converges with probability 1 to x if 


= . | 1 ! 
(7) lim Pi) xz, — =| $ A, | a — = -++ | Saupe — 


7 +90 


for every value of p 2 0, uniformly in p 2 0 for every positive number A. If 
p = 0 in (7), (7) becomes (6), so that convergence with probability 1 implies 
stochastic convergence. Although the converse is not true, if {x,} is a sequence 
of chance variables converging stochastically to x, there is a subsequence of 
(x,} which converges with probability 1 to x.5 The usual limit theorems hold 
here also: if x,, y, converge with probability 1 to x, y, x, + y, converges with 
probability 1 to x + y, ete. 

An example of convergence with probability 1 is the following. If in the 
previous example the hypothesis that E(x?) exists is removed, so that only the 
weaker hypothesis of the existence of E(x) is supposed, the Tchebycheff in- 

n 


, 1 
equality can no longer be applied, but a different method shows that — a xX; 
n 
j=1 
converges With probability 1 (and therefore stochastically) to E(x).6 This result 


is known as the strong law of large numbers. 


IV 


Let x, X1, X2, - - - be asequence of chance variables with distribution functions 
F(x), Fi(x), F2(x), - - - respectively. Then if lim F,(v) = F(x) for every value 


n—00 
of xz, the distribution of x, is said to converge to a limiting distribution with 
distribution function F(z). 

As an example, consider the Laplace-Liapounoff theorem. Let x, x2, - - - 
be a sequence of independent chance variables (i.e. any finite number of them 
form an independent set) with the same distribution functions, and let E(x,), 
E(x) exist. We suppose that o = E{[x, — E(x,)]?} > 0 so that the dis- 
tribution of x, is not merely confined to one point. Then the distribution of 


(8) a > [x — E(x;,)] 


j=1 


> The theories of probability and of measure are fundamentally identical. Chance 
variables correspond to measurable functions. Stochastic convergence corresponds to 
convergence in measure, and convergence with probability 1 corresponds to convergence 
almost everywhere. The relation between these two types of convergence is discussed 
(in the terminology of the measure theory) in E. W. Hobson, The Theory of Functions of 
a Real Variable, second edition Vol. 2, pp. 239-244. 

6 Cf. for instance J. L. Doob, Transactions of the American Mathematical Society, 
Vol. 36 (1934), pp. 764-765. 





164 J. L. DOOB 


converges to a limiting distribution with distribution function’ 


1 f* -= 

(9) —— | e 2edz. 
oV/ 2r J- 

The convergence of a sequence of n-variate distributions is defined as the 
convergence of the distribution functions just as above for n = 1. Suppose 
that (xu, +--+ ,Xn1), (X12, - + - »Xn2),- - - are independent sets of chance vari- 
ables (i.e. the distribution function of any finite number of sets is the product 
of the distribution functions of the sets) with the same distribution functions. 
We suppose that E(x;), E(x; ,) exist, 7 = 1,---,nand that of = E{[x;a — 


Ll om 


E(x;)2?}} > 0. Then if x;, = m =? >> [x; — E(x;)], the n-variate distribution 
+=) 


0 


Of Xim,* - +, Xnm converges to the normal distribution’ about zero means with 
variances oj, - - -,o;, and correlation coefficients {p;;} where oiojp;; = FE {[xa — 
E(xa)] [xa — E(xa)]}. 

Three lemmas will be needed below in applying these concepts. 

Lemma l. [f {x,} isa sequence of chance variables whose distributions approach 
a limiting distribution and if {y,} is a sequence of chance variables converging 
stochastically to 0, the sequence {Xny,} converges stochastically to 0. 

For if F(x) is the distribution function of the limiting distribution, and if X, 
mare any positive numbers, 


{| xnyn| <A} S P{lxyn| <A, lyn] Su} = P{lxn| <A/u, lynl Su} 


(10) = Pflyal Su} — Pflxe| = Mu} = — Pilyal > ul + Pitan <d/u} 
2 — Pily.| >»} + Pix. <d/p} — Piz. < — d/2p}. 
Then, letting n become infinite, 


(11) lim inf P{| x,yn| <A} = FQ/p) — F(—d/2y) 2 

n—v00 
Letting » approach 0, F(A/u) approaches 1, /'(—d/2u) approaches 0, and the right 
hand side becomes 1, as was to be proved. 

LemMaA 2. Let {xn}, {Yn}, {Zn} be sequences of chance variables such that the 
distribution of x, approaches a limiting distribution with continuous distribution 
function F(x) and such that the sequences {yn}, {Zn} converge stochastically to 0, 
1 respectively. Then the distributions of {xn/Zn}" and of x, + y, approach limit- 
ing distributions with the same distribution function F(x). 

7 A. Khintchine, Ergebnisse der Mathematik, Vol. 2, No. 4: Asymptotische Gesetze der 
Wahrscheinlichkeitsrechnung, pp. 1-8. 

8 Ibid. pp. 11-16. 

* If {a,} is a sequence of real numbers lim sup a, is defined as lim {least upper bound 


no ns 
ny Qnui, *** }, and lim inf a, is defined as —lim sup (—a,). A necessary and sufficient 
n> n—00 
condition that the sequence {a,} converge to a limit a is that lim inf a, = lim sup a, = a. 
no m->90 
10 Since z, converges stochastically to 1, the probability that z, = 0 approaches 0. The 
s A PI 

theorem is independent of the way x,/z, is defined when z, = 0. 



















LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 165 


a Kn 1 —Z, ‘ wieie ‘ 
Since — = x, + Xn are (neglecting the possibility that z, may vanish), 


n n 
where the last term converges stochastically to 0 by Lemma 1, it is sufficient 
to prove the second part of the theorem. If e > 0, and if z is an arbitrary 
number, 





(12) P{xn+ yn <x} = P{xn+yn<a,|yn| S €} + P{Xn+ Yn <2,|Yn| > €}. 





Since the sequence {y,} converges stochastically to 0, 


















(13) lim P{x, + yn <2, |yn| >} S lim P{|y,| > e«} =0 


n no 
so that in the limit the second term in (12) can be neglected. Moreover 


(14) P{x, + yn <2,|y,.| = e} S P{x,<z + ¢}. 








If we let n become infinite and then let € approach 0, (14) becomes 





(15) lim sup P{x, + yn <x} S F(z). 


n— 





A similar argument shows that 





(16) lim inf P{x, + yn < x} = F(a), 


no 





and (15), (16) taken together imply that 


(17) lim P{x, + y, <2} = F(z), 


ue 






as was to be proved. 
Lemma 3. If xi, X2, Xs, X4 are chance variables whose distribution has density 

function 

1 — L(attettei-s}) 


(2x)? ° 









the distribution of Z = X:X2 — X3X4 has density function }e~'*'.. 


The distribution of u = x;x2 and that of v = —xs3x, have the same density 
function: 
1 a x. _ : dt 
(18) | e 24 2, 
T Jo t 


Hence the distribution of z has density function 


roo f* P r—h)? 42 r2 7 
(19) ‘| Eee 





166 J. L. DOOB 


If we change to polar coérdinates: t = r cos 6,7 = r sin 6, and integrate out X, 


we obtain 
1 eo w/2 _ _2z? S 
: e ®r 2 drdd = Se '%!. 
T Jo 0 2 


V 


THEOREM 1. Let x), X2, X3, X4 delermine a 4-variate distribution with distri- 
bution function F(x1, x2, x3, x4). Suppose that E(x;), E(x{), E(x3x%) exist, 
i,j = 1, --- , 4, and suppose that E(x;) = 0, E(x?) = 1," 7,7 = 1, 2, 3, 4. 
Let X;;, X2;, X3;, X4; have the same 4-variate distribution as Xi, X2, X3, X4,J) = 1,--+, 7, 


mn 
and let the 4n-variate distribution function of {x,;} be [] F(21;, x2;, 23;, 4). We 
j=1 
shall use the following notation (which suppresses the dependence on n): 


, lwe 1 Ss ; 
(20) & = = 2, Xik 5 ye | Zs ee, pii = E(x,x)). 
k=1 k=1 
Let ¢ be a function of &;, s;;, defined in a neighborhood N of P: & = 0, si = pij, 
which, together with its second partial derivatives is continuous in N. Define 
o = 0 by 


i” 


0 S; 


=." ij. 9 


Toa 9 
(21) o = “) 2 = ~~ = Zz : (pi; = X;:X;) } 


where the partial derivatives are evaluated at P. Then if o > 0, the distribution of 
Vn le — ¢(P)] (where ¢ has the arguments &;, s;;) converges to a limiting distribu- 
tion which is normal with mean 0 and variance o?. 


To prove this theorem we expand ¢ in the neighborhood of P, obtaining 


4 4 
a) S. de 
(22) Vale —(P)]= >) = Vni-— >) a ¥* (pi; — Sis) + Rn 


¢=1 °° {,j=1 


where the partial derivatives are evaluated at P, and where R, consists of a 
linear combination of ~W/n&&;, VW nE(px% — Six), Vn (pi; — Siz) (ex — Se), With 
coefficients which are uniformly bounded as long as &;, s;; are in the neighbor- 
heod N. Now 


(23) lim &; = 0 lim sj; = pi; 
n> no 

with probability 1, by the law of large numbers, and as n becomes infinite the 
distributions of ~/néi, ~/n (oi; — si;) converge to limiting distributions, by the 

1 The hypothesis that #(x;) = 0 involves no real restriction, since the general case can 
be reduced to this one by substituting x; — E(x;) for x;. The hypothesis that E(x?) = 1 
can be met by substituting x;{E(x?)]-? whenever E(x?) > 0, which will always be true un- 
less x; = 0 with probability 1. 











LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 167 


Laplace-Liapounoff theorem. Then by Lemma 1, the terms of R, converge 
stochastically to 0. The other terms of +/n[¢ — ¢(P)] are sums to which the 
Laplace-Liapounoff theorem can be applied, giving the desired conclusion. 

As an example of the application of this theorem, we suppose that ¢ is a 
correlation coefficient: 


Sie 
(24) g=——., g(P) = pie. 

(Si: S22)? 
Here o? is EF {[x:x2 — 4p.2(xj + x2))?}, (which reduces to the familiar result 1 — p?. 
when the bivariate distribution of x:, x2 is normal) and « = 0 only when, with 
probability 1, 


(25) 2 xiX2 = pi2(x; + x5). 


As a second example we suppose that ¢ is a tetrad difference: 


. Si3Se4 — S44 Soe: 
(26) Ba P13 4 , 14 ©23 


— : or o(P) = pispe — pisprs. 
(Si1 Se Sgz S44)? 


Here co? becomes 


P)x~< | 
(27) = * poXiX3 + pisXeX1 — pisX2X3 — po3XiX4 — & » x; 


7=1 ) 


and o = 0 only when the quantity in the brackets vanishes with probability 1. 
If in either of the two above cases s;; — & & is substituted for s,; (i.e. if the 
deviations from the sample mean, not those from the true mean, are used), the 


result is unaltered. This is true in general, since 5 : ve are unaltered at P by 
+ *7 
this substitution. 

There is a well-known 6-method used in statistics to find limiting variances 
of statistics of the type covered by Theorem 1,” and Theorem 1 shows an 
interpretation which can be given to the results obtained by this method. 

We now investigate the necessary modification of Theorem 1 if « = 0, i.e. if 


4 4 
S) ae Oe wet 
(28) z 3, i » fee (pi; — X:x;) = 0 


{=1 t.3=4 


with probability 1. If we assume that ¢ has continuous third partial deriva- 
tives in the neighborhood N, we find that 


12 Examples of the use of this method can be found in T. L. Kelley, Crossroads in The 
Mind of Man, Stanford University (1928), pp. 49-50, and in an article by S. Wright, Annals 
of Mathematical Statistics, Vol. 5 (1934), p. 211. 


ee ,, n ay 

nie — (P)l = 5 - :E; ila exit 

¢ — 9(P) abe, a E08, °° pix) 

(29) 
>» We eros : he t 

+ ) St, 0 ij Sk (s,, pis) (Sea pri) + R,, 


— 


where R’ converges stochastically to 0. The second degree terms constitute 
a quadratic form in {&, Sj. — pj}. Now the multivariate distribution of 
{a/nki, Vn (Six — pjx)}, by the Laplace-Liapounoff theorem, converges to a 
normal distribution whose variances and correlation coefficients are those of 
X;, ¥;Xz. The distribution of nly — ¢(P)] thus converges to the distribution 
of the quadratic form 


n vo ° 
30) 2 Ss, 35 Bit . a 
( ) 9 2 a, =" a; 3 2 88,5 98k v j Bx 


where sa oe have the multivariate distribution just described, unless the 
quadratic form vanishes identically. This reasoning can be continued, the 
general result being that there is some power » of n, if ¢ is sufficiently regular, 
such that the distribution of n’[¢ — ¢(P)] converges to a limiting distribution. 

When o = 0 in the second exampie, unless the distribution of x:, Xe, X3, X4 
is confined with probability 1 to a 4-dimensional quadric, pi3 = pis = p23 = 
pos = 0. Equation (29) becomes 


(29’) nlp — ¢(P)] = sisSea — SiuSes + R,. 


Now if x:, X2 are transformed by a linear homogeneous transformation with 
determinant A, it is readily seen that s13S21 — Si4Se3 is multiplied by A. The 
same is true of x3, xs. If x, x2. are transformed into x,, x, so that E(x;”) = 1, 
E(x{x,) = 0, the determinant of the transformation is +(1 — p7,)-?. Then 
transforming each pair (x:, X2), (xs, Xs) in this way into (x}, x.), (x3, x,), the 


: , / / c . / 
variables x;, X2, X3, X, are uncorrelated. Ifs,;; = 


: e 9 Pi 
(31) $13S04 — 814803 = 


(i — pf M1 — 92)! 


. o.¢ . . . / / / / . . . ° . / , / / 
The limiting distribution of $, 389, — $1482; is the distribution of 8;,854 — 814803 
where these four chance variables are normally distributed, E(8,;) = E(83,) = 
E(B1,) = E(G23) = 0, E@;,) = E(x; x;), E@i;6i.) = E(x; x; x,x1). Now if 
X}, Xe, X3, X4 are normally distributed—the most important case for statistical 
purposes—x,, X3, X3, X, Will also be distributed normally, and the vanishing of 
the correlation coefficients means that the chance variables are independent. 
If this is true 


(32) E@;;) = 1, E(8.;8:) = 9, 8; ~ Bu)- 





LIMITING DISTRIBUTIONS OF CERTAIN STATISTICS 169 


Evidently, however, x}, Xs, X;, X, do not have to be independent to make these 
equations valid. It is more than sufficient if the pairs (x1, x2), (xs, x4) and there- 
fore the pairs (x{, x»), (x3, X,) are independent. If (32) is true, the @’s are in- 
dependent, each one being normally distributed with mean 0 and variance 1. 
Summarizing these results, and using Lemma 3: if ¢ is the tetrad difference and 
if pis = pis = p23 = pos = O, the distribution of nlg — ¢(P)| converges to a limiting 
distribution. If in addition the distribution of x1, X2, Xs, X4 ts normal, or if the 
pairs (X1, X2) (Xs, Xs) are independent, this limiting distribution has density function 


where c = (1 — pio)? (1 — p34). 

Wilks has investigated the case where x;, Xe, X3, X1 are normally and inde- 
pendently distributed, and in this case found the exact variance of the tetrad 
difference as a function of n.¥ 


CoLumBiA UNIVERSITY. 


18 Proceedings of the National Academy of Sciences, Vol. 18, (1932), pp. 562-565. 





